Open russellb opened 1 week ago
Same error happened again here -- https://github.com/russellb/ilab-runner/actions/runs/9831900354/job/27139983196
Note that after seeing how many knowledge samples we're processing, I switched the knowledge sample we're testing: https://github.com/instructlab/instructlab/pull/1620
I wanted to note that in case the problem disappears now that I put a smaller knowledge example in place to speed up CI
Another variation of a similar error -- https://github.com/instructlab/sdg/actions/runs/9846627527/job/27184863665
looks like this is #99
INFO 2024-07-08 20:59:39,250 pipeline.py:49: generate Dataset({
features: ['task_description', 'seed_context', 'seed_question', 'seed_response', 'context'],
num_rows: 62
})
ERROR 2024-07-08 20:59:39,250 block.py:37: _validate Missing key: 'num_samples'
Traceback (most recent call last):
File "/actions-runner/_work/sdg/sdg/venv/bin/ilab", line 8, in <module>
sys.exit(ilab())
^^^^^^
File "/actions-runner/_work/sdg/sdg/venv/lib64/python3.11/site-packages/click/core.py", line 1157, in __call__
return self.main(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/actions-runner/_work/sdg/sdg/venv/lib64/python3.11/site-packages/click/core.py", line 1078, in main
rv = self.invoke(ctx)
^^^^^^^^^^^^^^^^
File "/actions-runner/_work/sdg/sdg/venv/lib64/python3.11/site-packages/click/core.py", line 1688, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/actions-runner/_work/sdg/sdg/venv/lib64/python3.11/site-packages/click/core.py", line 1688, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/actions-runner/_work/sdg/sdg/venv/lib64/python3.11/site-packages/click/core.py", line [143](https://github.com/instructlab/sdg/actions/runs/9846627527/job/27184863665#step:14:144)4, in invoke
return ctx.invoke(self.callback, **ctx.params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/actions-runner/_work/sdg/sdg/venv/lib64/python3.11/site-packages/click/core.py", line 783, in invoke
return __callback(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/actions-runner/_work/sdg/sdg/venv/lib64/python3.11/site-packages/click/decorators.py", line 33, in new_func
return f(get_current_context(), *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/actions-runner/_work/sdg/sdg/venv/lib64/python3.11/site-packages/instructlab/utils.py", line 551, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/actions-runner/_work/sdg/sdg/venv/lib64/python3.11/site-packages/instructlab/data/generate.py", line 194, in generate
generate_data(
File "/actions-runner/_work/sdg/sdg/venv/lib64/python3.11/site-packages/instructlab/sdg/generate_data.py", line 283, in generate_data
new_generated_data = sdg.generate(ds)
^^^^^^^^^^^^^^^^
File "/actions-runner/_work/sdg/sdg/venv/lib64/python3.11/site-packages/instructlab/sdg/sdg.py", line 19, in generate
dataset = pipeline.generate(dataset)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/actions-runner/_work/sdg/sdg/venv/lib64/python3.11/site-packages/instructlab/sdg/pipeline.py", line 58, in generate
dataset = self._drop_duplicates(dataset, cols=drop_duplicates_cols)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/actions-runner/_work/sdg/sdg/venv/lib64/python3.11/site-packages/instructlab/sdg/pipeline.py", line 25, in _drop_duplicates
df = dataset.to_pandas()
^^^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'to_pandas'
The above error is on the freeform flow so that is not related to #99
I've been working on an end-to-end CI job that includes the full SDG pipeline. I saw this exception occur in one of the test runs.
https://github.com/russellb/ilab-runner/actions/runs/9830002619/job/27135724673