issues
search
instructlab
/
sdg
Python library for Synthetic Data Generation
https://pypi.org/project/instructlab-sdg/
Apache License 2.0
23
stars
35
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Unit Testing for Document Chunkers
#354
khaledsulayman
opened
1 week ago
0
Enable support for docling V2 supported filetypes
#353
khaledsulayman
opened
2 weeks ago
1
Prefer tesserocr vs easyocr for Docling integration, when available
#352
bbrowning
closed
1 week ago
2
Remove unnecessary requirement for qna.yaml in ContextAwareChunker
#351
khaledsulayman
closed
1 week ago
0
Use Docling v2 hierarchical chunking instead of the existing context-aware chunking implementation
#350
jwm4
opened
2 weeks ago
0
Upgrade docling, expand chunking testing
#349
bbrowning
closed
1 week ago
4
build(deps): update torch requirement from <2.5.0,>=2.3.0 to >=2.3.0,<2.6.0
#348
dependabot[bot]
closed
1 week ago
4
Move to Docling v2 APIs
#347
bbrowning
closed
2 weeks ago
0
[Epic] Testing for Document Chunking
#346
khaledsulayman
opened
2 weeks ago
0
Migrate to docling v2 json format
#344
khaledsulayman
closed
2 weeks ago
1
Enable Tokenizer Loading from downloaded Teacher Model
#343
khaledsulayman
closed
1 week ago
4
build(deps): bump pypa/gh-action-pypi-publish from 1.12.0 to 1.12.2
#342
dependabot[bot]
closed
2 weeks ago
0
feat: support converting messages datasets into multiple pre-training formats
#341
jaideepr97
closed
2 weeks ago
3
feat: expose max_num_tokens as configurable
#340
cdoern
closed
2 weeks ago
5
feat: parametrize system prompt
#339
jaideepr97
closed
2 weeks ago
7
build(deps): bump rojopolis/spellcheck-github-actions from 0.44.0 to 0.45.0
#338
dependabot[bot]
closed
2 weeks ago
0
build(deps): bump pypa/gh-action-pypi-publish from 1.11.0 to 1.12.0
#337
dependabot[bot]
closed
2 weeks ago
0
feat: Allow support for user supplied system prompts
#336
jaideepr97
closed
2 weeks ago
0
Docling models path
#335
aakankshaduggal
closed
1 week ago
0
Chunking Refactor: Always use Context-Aware Chunker
#334
aakankshaduggal
opened
2 weeks ago
3
Move to docling v2 for PDF support
#333
aakankshaduggal
closed
2 weeks ago
1
build(deps): bump rhysd/actionlint from 1.7.3 to 1.7.4 in /.github/workflows
#332
dependabot[bot]
closed
2 weeks ago
0
Repo needs `CHANGELOG.md` document
#331
nathan-weinberg
opened
3 weeks ago
1
Repo needs `release-strategy.md` document
#330
nathan-weinberg
closed
1 week ago
4
build(deps): bump hynek/build-and-inspect-python-package from 2.9.0 to 2.10.0
#329
dependabot[bot]
closed
2 weeks ago
0
build(deps): bump machulav/ec2-github-runner from 2.3.6 to 2.3.7
#328
dependabot[bot]
closed
2 weeks ago
0
build(deps): bump pypa/gh-action-pypi-publish from 1.10.3 to 1.11.0
#327
dependabot[bot]
closed
3 weeks ago
0
Quality of SDG and training results are affected by the use of non-ASCII characters
#363
AlexonOliveiraRH
opened
3 weeks ago
0
build(deps): bump rojopolis/spellcheck-github-actions from 0.43.1 to 0.44.0
#326
dependabot[bot]
closed
3 weeks ago
0
ci: convert med E2E CI job to L4 GPU
#325
nathan-weinberg
closed
4 weeks ago
0
[EPIC] Expand Knowledge Document Ingestion Pipeline
#324
aakankshaduggal
opened
4 weeks ago
0
build(deps): bump actions/setup-python from 5.2.0 to 5.3.0
#323
dependabot[bot]
closed
3 weeks ago
0
ci: use org variable for AWS EC2 AMI in E2E CI jobs
#322
nathan-weinberg
closed
4 weeks ago
0
build(deps): bump actions/checkout from 4.2.1 to 4.2.2
#321
dependabot[bot]
closed
2 weeks ago
2
build(deps): bump actions/cache from 4.1.1 to 4.1.2
#320
dependabot[bot]
closed
4 weeks ago
0
fix: medium E2E CI job was missing HF_TOKEN
#319
nathan-weinberg
closed
1 month ago
3
ci: update medium job to run as PR check
#318
nathan-weinberg
closed
1 month ago
2
ci: update small E2E job to align with CLI and Training
#317
nathan-weinberg
closed
1 month ago
1
llama-cpp multi server support
#316
cdoern
opened
1 month ago
5
map mistral model name to mixtral
#315
cdoern
closed
1 month ago
0
build(deps): bump rojopolis/spellcheck-github-actions from 0.43.0 to 0.43.1
#314
dependabot[bot]
closed
1 month ago
0
Gather data on model performance per SDG Pipeline
#313
ktam3
opened
1 month ago
1
Allow for SDG & Evaluation to have unique environments, filesystems
#312
sallyom
opened
1 month ago
1
fix: remove stop token from mixtral (backport #310)
#311
mergify[bot]
closed
1 month ago
0
fix: remove stop token from mixtral
#310
cdoern
closed
1 month ago
2
chore: rename 'basic-workflow-tests' to 'e2e-custom' (backport #306)
#309
mergify[bot]
closed
1 month ago
1
fix: change "group" to "tag" for mmlu_branch task config (backport #305)
#308
mergify[bot]
closed
1 month ago
3
SDG agnetic pipeline documentation
#307
relyt0925
opened
1 month ago
5
chore: rename 'basic-workflow-tests' to 'e2e-custom'
#306
nathan-weinberg
closed
1 month ago
2
fix: change "group" to "tag" for mmlu_branch task config
#305
alimaredia
closed
1 month ago
2
Previous
Next