issues
search
instructlab
/
sdg
Python library for Synthetic Data Generation
https://pypi.org/project/instructlab-sdg/
Apache License 2.0
23
stars
35
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Incorrect identify system prompt while generate SDG
#406
alfandindarahmawan
opened
8 hours ago
1
Remove / replace spellcheck auxiliary instruction in knowledge pipeline
#405
bbrowning
opened
1 day ago
0
Sdg v0.6.0+ multiple knowledge sources fails to clone
#404
KodieGlosserIBM
opened
2 days ago
1
Add [End] to parser cleanup tags (backport #400)
#403
mergify[bot]
closed
2 days ago
0
refactor: Introduce jldump
#402
makelinux
opened
3 days ago
0
build(deps): bump step-security/harden-runner from 2.10.1 to 2.10.2
#401
dependabot[bot]
closed
1 day ago
0
Add [End] to parser cleanup tags
#400
abhi1092
closed
2 days ago
2
Add `[End]` to list of keywords to clean in knowledge pipeline config
#399
abhi1092
closed
2 days ago
2
refactor: generated_data as list
#398
makelinux
opened
4 days ago
0
Don't fail fast for unit and functional tests
#397
danmcp
closed
4 days ago
0
refactor: remove unused generate_data arguments
#396
makelinux
closed
3 days ago
0
Adjust to slack-github-action 2.0 api changes
#395
danmcp
closed
4 days ago
0
fix: missing regex from actionlint action (backport #390)
#394
mergify[bot]
opened
6 days ago
0
Delete .gitattributes
#393
khaledsulayman
closed
6 days ago
1
Docling models path (backport #362)
#392
mergify[bot]
closed
2 days ago
2
Prefer tesserocr over easyocr, if available (backport #369)
#391
mergify[bot]
closed
6 days ago
0
fix: missing regex from actionlint action
#390
nathan-weinberg
closed
6 days ago
2
ci: add large-size E2E CI job (backport #380)
#389
mergify[bot]
closed
6 days ago
1
Allow documents to be loaded locally for Knowledge Injection
#388
murthyrudra
opened
1 week ago
0
build(deps-dev): update pre-commit requirement from <4.0,>=3.0.4 to >=3.0.4,<5.0
#387
dependabot[bot]
closed
1 day ago
0
build(deps): bump DavidAnson/markdownlint-cli2-action from 17.0.0 to 18.0.0
#386
dependabot[bot]
closed
6 days ago
0
build(deps): bump slackapi/slack-github-action from 1.27.0 to 2.0.0
#385
dependabot[bot]
closed
4 days ago
2
Download tokenizer artifacts in CI instead of storing them in `tests/testdata/models`
#384
khaledsulayman
opened
1 week ago
0
Documentation Update for `docling_model_path`:
#383
aakankshaduggal
opened
1 week ago
0
Add Release Strategy Document
#381
khaledsulayman
closed
1 week ago
0
ci: add large-size E2E CI job
#380
nathan-weinberg
closed
1 week ago
2
Add proper typehints
#379
RobotSail
opened
1 week ago
0
fix: formatting error
#378
RobotSail
closed
1 week ago
0
fix: upsample the phase10 knowledge dataset
#377
RobotSail
closed
6 days ago
1
Add support for preference tuning pipelines
#376
ktam3
opened
1 week ago
2
Add tests for the datamixing ensuring all reqd datasets are mixed appropriately
#375
aakankshaduggal
opened
1 week ago
0
[Epic] Fully Utilize Docling V2 Capabilities
#374
ktam3
opened
1 week ago
0
[Epic] Reconcile ilab SDG and Research SDG 2.0
#373
ktam3
opened
1 week ago
0
Run the simple pipeline on small runners
#372
bbrowning
closed
1 week ago
2
Prepare release-v0.3 branch for backports
#371
bbrowning
closed
1 week ago
2
chore: rename 'basic-workflow-tests' to 'e2e-custom'
#370
bbrowning
closed
1 week ago
1
Prefer tesserocr over easyocr, if available
#369
bbrowning
closed
1 week ago
5
Data mix fix (backport #366)
#368
mergify[bot]
closed
1 week ago
9
Add simple and full knowledge pipeline functional tests
#367
bbrowning
opened
1 week ago
2
Data mix fix
#366
aakankshaduggal
closed
1 week ago
4
Data Mixing Phase 10 - knowledge pre-training dataset not getting mixed in
#365
khaledsulayman
closed
1 week ago
5
Check for tokenizer in downloaded models directory
#364
khaledsulayman
closed
1 week ago
7
Docling models path
#362
aakankshaduggal
closed
1 week ago
6
Only use CPU for the docling OCR models
#361
bbrowning
closed
1 week ago
1
AssertionErrors after starting the SDG
#360
acsankar
opened
1 week ago
1
Move a spurious print to a debug log message
#359
bbrowning
closed
1 week ago
0
Don't attempt batching with InstructLab's llama-cpp-python
#358
bbrowning
closed
1 week ago
2
Leaf nodes with empty sdg output
#357
acsankar
opened
1 week ago
1
Consolidate test sample documents into one subdir
#356
bbrowning
closed
1 week ago
1
Update E2E jobs to run SDG on different filetypes
#355
khaledsulayman
opened
1 week ago
0
Next