sambanova generative_data_prep issues

sambanova / generative_data_prep

Apache License 2.0

58 stars 8 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

Readme Updates

#123 snova-zoltanc closed 1 week ago
1
Deprecate loading tokenizer from gpt tokenizer class, this is confusing

#122 snova-zoltanc closed 2 weeks ago
1
Lower transformers version to be compatible with studio

#121 snova-zoltanc opened 2 weeks ago
1
[DO NOT MERGE] Testing

#120 davidzyx closed 2 weeks ago
2
Make data prep accessible from python and callable with training hyper-parameters

#119 snova-zoltanc closed 2 weeks ago
2
Update main to define arguments directly

#118 snova-zoltanc closed 1 month ago
3
only grab num tokenized articles lock every 100 steps

#117 snova-zoltanc closed 1 month ago
7
update the data prep repo to accept a directory of input jsonls

#116 snova-zoltanc closed 2 months ago
2
Speedup Multiprocessing - decrease wait time for shared variable locks

#115 snova-zoltanc closed 2 weeks ago
1
Studio Integration: Update Input Path To Accept Directory

#114 snova-zoltanc closed 2 months ago
1
Studio Integration: Logging

#113 snova-zoltanc opened 2 months ago
0
Studio integration: Dataset validation function

#112 snova-zoltanc opened 2 months ago
0
Studio integration: update argument parsing so data prep can be called from Studio

#111 snova-zoltanc closed 2 weeks ago
1
graceful error handling, new flag for ignoring format error

#110 snova-jerrym closed 2 months ago
1
Integration of Data Prep Github Repo into Studio

#109 samratdeepprasad closed 2 months ago
6
ignore torch vulnerability

#108 rrlamichhane closed 3 months ago
1
add codecov

#107 rrlamichhane closed 3 months ago
4
update requirements to only require python>=3.8.10

#106 snova-zoltanc closed 3 months ago
0
Unable to install package

#105 snova-varunkrishna closed 4 months ago
2
fix bug to add last turn in prompt completion pairs

#104 snova-zoltanc closed 5 months ago
0
Zoltanc/apply chat tempalte bug fix

#103 snova-zoltanc closed 5 months ago
0
Apply Chat Template Error

#102 snova-zoltanc closed 5 months ago
1
Support user assistant chat dataset format

#101 snova-zoltanc opened 5 months ago
0
Check pipenv on existing lock, then check if lock will pass

#100 snova-ranjanl closed 5 months ago
0
update pip lock files

#99 snova-zoltanc closed 5 months ago
1
Improve the error handling for invalid jsonl line file input

#98 snova-zoltanc closed 2 months ago
1
Update CONTRIBUTING.rst

#97 snova-ranjanl closed 5 months ago
0
README examples default to --output_path of empty directory so that they do not fail out

#96 snova-zoltanc closed 2 weeks ago
0
Clean up final print statements with training requirements

#95 snova-zoltanc closed 6 months ago
0
Patch/ranjanl/add sentencepiece requirement

#94 snova-ranjanl closed 6 months ago
0
Max Sequence Length - Tokens Clarifications

#93 snova-connorm closed 6 months ago
0
Lower python requirements to python3.8 for data scale users

#92 snova-zoltanc closed 6 months ago
0
Improve Decoder Instructions

#91 snova-connorm closed 7 months ago
0
ENH - README Structure Reorganization

#90 snova-connorm closed 7 months ago
1
Add Key Params Section

#89 snova-connorm closed 7 months ago
0
Add Dataset Size Requirements Section

#88 snova-connorm closed 7 months ago
7
update requirements to include sentencepiece

#87 snova-zoltanc closed 6 months ago
1
Specify how to pick the number of training splits

#86 snova-zoltanc closed 5 months ago
1
Create FAQ section

#85 snova-zoltanc opened 8 months ago
1
Prompt_prefix not interpreted correctly

#84 snova-bol opened 8 months ago
0
Snova chenw/enhance chat template

#83 snova-chenw closed 5 months ago
3
tokenization time remaining not working

#82 snova-zoltanc closed 6 months ago
1
update logic to check that prompt is not None

#81 snova-zoltanc closed 9 months ago
0
Add pytests to ensure that tokenizer metrics are correct, including for larger datasets

#80 snova-zoltanc closed 2 months ago
2
Fix bug where for larger datasets the metrics are incorrect

#79 snova-zoltanc closed 9 months ago
1
Add pytests for EOS and BOS tokens, and ensuring Llama tokenizer works properly OOB

#78 snova-zoltanc closed 5 months ago
1
Ensure no bos or eos added to completion, no eos added to prompt

#77 snova-zoltanc closed 9 months ago
0
update pad token id to use tokenziers padding token

#76 snova-zoltanc closed 10 months ago
0
Update CONTRIBUTING.rst

#75 snova-ranjanl closed 8 months ago
6
Update repo to require uses to use python 3.9+

#74 snova-zoltanc closed 11 months ago
0