issues
search
sambanova
/
generative_data_prep
Apache License 2.0
58
stars
7
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Jayr/bugfix/sha256 dir exists
#68
snova-jayr
closed
9 months ago
0
Bugfix/ranjan/btd 2432 fix pipenv lock issues
#67
snova-ranjanl
closed
10 months ago
0
apply chat_template
#66
hongfenglu
closed
3 months ago
1
add pipenv primer
#65
snova-ranjanl
closed
4 months ago
9
Adding in metadata pydantic model and verification checks during runtime
#64
snova-jayr
closed
9 months ago
1
Move logging config to use python
#63
snova-zoltanc
closed
10 months ago
0
No Protections for runtime (wall clock time) or RAM usage
#62
snova-zoltanc
opened
10 months ago
1
Tokenization Is Not Optimal, should use batched encoding
#61
snova-zoltanc
opened
10 months ago
1
Set add_special_tokens=False for completions.
#60
snova-zoltanc
closed
8 months ago
2
adding in saving of model config
#59
snova-jayr
closed
10 months ago
0
Feature/ranjan/btd 2387 package logging config
#58
snova-ranjanl
closed
10 months ago
0
Cannot launch generative_data_prep from different directory
#57
rraju1
closed
10 months ago
1
Implement absolute path to logging config file
#56
snova-zoltanc
closed
10 months ago
1
add different time limits for issues and prs
#55
snova-ranjanl
closed
10 months ago
0
KeyError during balancing if fewer lines in input file than splits
#54
snova-leonz
closed
8 months ago
1
Token Metrics Incorrect For Large Datasets
#53
snova-zoltanc
closed
7 months ago
1
update tokenization to have progress bar
#52
snova-zoltanc
closed
8 months ago
3
Greedy Drop Test Case
#51
snova-virens
closed
11 months ago
0
Greedy Truncate Right Bug
#50
snova-virens
closed
11 months ago
1
Add category id rebalancing
#49
snova-zoltanc
closed
11 months ago
0
Zoltanc/category id rebalance bug fix
#48
snova-zoltanc
closed
11 months ago
0
Zoltanc/category id rebalance bug fix
#47
snova-zoltanc
closed
11 months ago
0
Greedy Packing All Padding Tokens Bug Fix
#46
snova-virens
closed
11 months ago
0
Zoltanc/logging
#45
snova-zoltanc
closed
11 months ago
0
Track Metrics About Tokenization Results
#44
snova-zoltanc
closed
11 months ago
2
Update instructions; enable codecov
#43
snova-ranjanl
closed
10 months ago
0
use hatch for build
#42
snova-ranjanl
closed
12 months ago
0
Debug circleci pipeline
#41
snova-ranjanl
closed
12 months ago
1
Fix typo in README.md
#40
eltociear
closed
12 months ago
3
Update padding tokens to use padding token from tokenizer if it exists
#39
snova-zoltanc
closed
11 months ago
1
Fail Gracefully when Child Process Fails
#38
snova-virens
closed
12 months ago
0
Tracking dataset metrics
#37
snova-zoltanc
closed
11 months ago
1
Progress Bar
#36
snova-zoltanc
closed
8 months ago
1
Documentation About Input Packing Config
#35
snova-zoltanc
opened
1 year ago
1
Child processes killed silently, causes code to hang
#34
snova-zoltanc
closed
11 months ago
2
Creation of jsonl files
#33
snova-darshang
closed
3 months ago
3
Add documentation into the readme about how to save holdout evaluation or test data.
#32
snova-zoltanc
closed
1 year ago
0
Add Category ID as new token metadata
#31
snova-zoltanc
closed
1 year ago
0
Fix README to reference output_path correctly
#30
snova-zoltanc
closed
1 year ago
0
reduce num_worker default to 16
#29
snova-zoltanc
closed
1 year ago
0
add in updates to the README to reflect the output format of pipeline
#28
snova-zoltanc
closed
1 year ago
0
Fail out with error if the OS kills any of the tokenization processes.
#27
snova-zoltanc
closed
12 months ago
0
Remove split data dir with shut.rmtree() instead of os.rmdir()
#26
snova-zoltanc
closed
1 year ago
0
run all checks for all files
#25
snova-ranjanl
closed
1 year ago
1
Fix incorrect exception cases
#24
snova-zoltanc
closed
1 year ago
0
Delete split jsonl files and save hdf5 directly to output_path
#23
snova-zoltanc
closed
1 year ago
3
Implement token class
#22
snova-zoltanc
closed
1 year ago
0
add logo and adapt to dark or light mode
#21
snova-zoltanc
closed
1 year ago
0
Zoltanc/fix flake8
#20
snova-zoltanc
closed
1 year ago
0
add app_logger
#19
snova-johnl
closed
1 year ago
1
Previous
Next