issues
search
sambanova
/
generative_data_prep
Apache License 2.0
58
stars
8
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Readme Updates
#123
snova-zoltanc
closed
1 week ago
1
Deprecate loading tokenizer from gpt tokenizer class, this is confusing
#122
snova-zoltanc
closed
2 weeks ago
1
Lower transformers version to be compatible with studio
#121
snova-zoltanc
opened
2 weeks ago
1
[DO NOT MERGE] Testing
#120
davidzyx
closed
2 weeks ago
2
Make data prep accessible from python and callable with training hyper-parameters
#119
snova-zoltanc
closed
2 weeks ago
2
Update main to define arguments directly
#118
snova-zoltanc
closed
1 month ago
3
only grab num tokenized articles lock every 100 steps
#117
snova-zoltanc
closed
1 month ago
7
update the data prep repo to accept a directory of input jsonls
#116
snova-zoltanc
closed
2 months ago
2
Speedup Multiprocessing - decrease wait time for shared variable locks
#115
snova-zoltanc
closed
2 weeks ago
1
Studio Integration: Update Input Path To Accept Directory
#114
snova-zoltanc
closed
2 months ago
1
Studio Integration: Logging
#113
snova-zoltanc
opened
2 months ago
0
Studio integration: Dataset validation function
#112
snova-zoltanc
opened
2 months ago
0
Studio integration: update argument parsing so data prep can be called from Studio
#111
snova-zoltanc
closed
2 weeks ago
1
graceful error handling, new flag for ignoring format error
#110
snova-jerrym
closed
2 months ago
1
Integration of Data Prep Github Repo into Studio
#109
samratdeepprasad
closed
2 months ago
6
ignore torch vulnerability
#108
rrlamichhane
closed
3 months ago
1
add codecov
#107
rrlamichhane
closed
3 months ago
4
update requirements to only require python>=3.8.10
#106
snova-zoltanc
closed
3 months ago
0
Unable to install package
#105
snova-varunkrishna
closed
4 months ago
2
fix bug to add last turn in prompt completion pairs
#104
snova-zoltanc
closed
5 months ago
0
Zoltanc/apply chat tempalte bug fix
#103
snova-zoltanc
closed
5 months ago
0
Apply Chat Template Error
#102
snova-zoltanc
closed
5 months ago
1
Support user assistant chat dataset format
#101
snova-zoltanc
opened
5 months ago
0
Check pipenv on existing lock, then check if lock will pass
#100
snova-ranjanl
closed
5 months ago
0
update pip lock files
#99
snova-zoltanc
closed
5 months ago
1
Improve the error handling for invalid jsonl line file input
#98
snova-zoltanc
closed
2 months ago
1
Update CONTRIBUTING.rst
#97
snova-ranjanl
closed
5 months ago
0
README examples default to --output_path of empty directory so that they do not fail out
#96
snova-zoltanc
closed
2 weeks ago
0
Clean up final print statements with training requirements
#95
snova-zoltanc
closed
6 months ago
0
Patch/ranjanl/add sentencepiece requirement
#94
snova-ranjanl
closed
6 months ago
0
Max Sequence Length - Tokens Clarifications
#93
snova-connorm
closed
6 months ago
0
Lower python requirements to python3.8 for data scale users
#92
snova-zoltanc
closed
6 months ago
0
Improve Decoder Instructions
#91
snova-connorm
closed
7 months ago
0
ENH - README Structure Reorganization
#90
snova-connorm
closed
7 months ago
1
Add Key Params Section
#89
snova-connorm
closed
7 months ago
0
Add Dataset Size Requirements Section
#88
snova-connorm
closed
7 months ago
7
update requirements to include sentencepiece
#87
snova-zoltanc
closed
6 months ago
1
Specify how to pick the number of training splits
#86
snova-zoltanc
closed
5 months ago
1
Create FAQ section
#85
snova-zoltanc
opened
8 months ago
1
Prompt_prefix not interpreted correctly
#84
snova-bol
opened
8 months ago
0
Snova chenw/enhance chat template
#83
snova-chenw
closed
5 months ago
3
tokenization time remaining not working
#82
snova-zoltanc
closed
6 months ago
1
update logic to check that prompt is not None
#81
snova-zoltanc
closed
9 months ago
0
Add pytests to ensure that tokenizer metrics are correct, including for larger datasets
#80
snova-zoltanc
closed
2 months ago
2
Fix bug where for larger datasets the metrics are incorrect
#79
snova-zoltanc
closed
9 months ago
1
Add pytests for EOS and BOS tokens, and ensuring Llama tokenizer works properly OOB
#78
snova-zoltanc
closed
5 months ago
1
Ensure no bos or eos added to completion, no eos added to prompt
#77
snova-zoltanc
closed
9 months ago
0
update pad token id to use tokenziers padding token
#76
snova-zoltanc
closed
10 months ago
0
Update CONTRIBUTING.rst
#75
snova-ranjanl
closed
8 months ago
6
Update repo to require uses to use python 3.9+
#74
snova-zoltanc
closed
11 months ago
0
Next