Closed grndnl closed 1 year ago
Hi Daniele, can you please share the output of pip freeze
?
Thanks for the help, below is the output.
I want to clarify that the prediction on the validation set completes successfully, although the number of samples does not match (are some samples being dropped because they are too long to fit into the context of the 256-bart model?).
I've looked at the dataset downloaded by the script in /.cache/huggingface/datasets/
and it has the correct number of samples.
Also, fine-tuning on 256-bart model worked as expected, it seems.
absl-py==2.0.0
aiohttp==3.8.6
aiosignal==1.3.1
antlr4-python3-runtime==4.8
appdirs==1.4.4
async-timeout==4.0.3
attrs==23.1.0
bitarray==2.8.2
certifi==2023.7.22
cffi==1.16.0
charset-normalizer==3.3.0
click==8.1.7
colorama==0.4.6
Cython==3.0.3
datasets==1.17.0
dill==0.3.7
docker-pycreds==0.4.0
fairseq==0.12.2
filelock==3.12.4
frozenlist==1.4.0
fsspec==2023.9.2
gitdb==4.0.10
GitPython==3.1.37
huggingface-hub==0.18.0
hydra-core==1.0.7
idna==3.4
importlib-resources==6.1.0
joblib==1.3.2
lxml==4.9.3
multidict==6.0.4
multiprocess==0.70.15
nltk==3.8.1
numpy==1.24.4
omegaconf==2.0.6
packaging==23.2
pandas==2.0.3
pathtools==0.1.2
plotly==5.3.1
portalocker==2.8.2
protobuf==4.24.4
psutil==5.9.5
pyarrow==13.0.0
pycparser==2.21
python-dateutil==2.8.2
pytz==2023.3.post1
PyYAML==6.0.1
regex==2023.10.3
requests==2.31.0
rouge-score==0.1.2
sacrebleu==2.3.1
sacremoses==0.0.53
sentencepiece==0.1.99
sentry-sdk==1.32.0
setproctitle==1.3.3
six==1.16.0
smmap==5.0.1
tabulate==0.9.0
tenacity==8.2.3
tokenizers==0.10.3
torch==1.9.0+cu111
torchaudio==0.9.0
tqdm==4.66.1
transformers @ git+http://github.com/eladsegal/public-transformers@839ed93a19dc344e72cd1afe1b604addc74040bd
typing_extensions==4.8.0
tzdata==2023.3
urllib3==2.0.6
wandb==0.15.12
xxhash==3.4.1
yarl==1.9.2
zipp==3.17.0
I realized some answers to my two original questions:
eval_dataset
contains 984 samples (correctly), I see the following warning, which I'm not able to understand:
2023-10-17 04:45:09 | WARNING | datasets.fingerprint | Parameter 'function'=<function preprocess_function at 0x7fa38167d790> of the transform datasets.arrow_dataset.Dataset._map_single couldn't be hashed properly, a random hash was used instead. Make sure your transforms and parameters are serializable with pickle or dill for the dataset fingerprinting and caching to work. If you reuse this transform, the caching mechanism will consider it to be different from the previous calls and recompute everything. This warning is only showed once. Subsequent hashing failures won't be showed.
2023-10-17 04:45:09 | WARNING | datasets.arrow_dataset | Loading cached processed dataset at /home/ubuntu/.cache/huggingface/datasets/tau___scrolls/qasper/1.0.0/672021d5d8e1edff998a6ea7a5bff35fdfd0ae243e7cf6a8c88a57a04afb46ac/cache-1c80317fa3b1799d.arrow
2. For the test data, the deduplication function does not drop any data (I'm still seeing 1399 samples right after deduplication). However, the p[re-processing function for the test data ](https://github.com/tau-nlp/scrolls/blob/1fb1042e66fd005b76fc5ad4557d31ed2bab61c7/baselines/src/run.py#L597)seems to do something wrong, because the size of the test data afterwards is 984 (suspiciously the same as the validation data). I also see a similar warning again:
2023-10-17 04:45:36 | WARNING | datasets.fingerprint | Parameter 'function'=<function preprocess_function at 0x7f4917b06ca0> of the transform datasets.arrow_dataset.Dataset._map_single couldn't be hashed properly, a random hash was used instead. Make sure your transforms and parameters are serializable with pickle or dill for the dataset fingerprinting and caching to work. If you reuse this transform, the caching mechanism will consider it to be different from the previous calls and recompute everything. This warning is only showed once. Subsequent hashing failures won't be showed. 2023-10-17 04:45:36 | WARNING | datasets.arrow_dataset | Loading cached processed dataset at /home/ubuntu/.cache/huggingface/datasets/tau___scrolls/qasper/1.0.0/672021d5d8e1edff998a6ea7a5bff35fdfd0ae243e7cf6a8c88a57a04afb46ac/cache-1c80317fa3b1799d.arrow
As you can see, the two warning messages are loading the same cashed dataset.
By changing `load_from_cache_file=False` [here](https://github.com/tau-nlp/scrolls/blob/1fb1042e66fd005b76fc5ad4557d31ed2bab61c7/baselines/src/run.py#L603C27-L603C27), I now see 1399 predict samples.
-------------------------------------------------------------------------------------------------------------
This leaves question 1 answered, and question 2 temporarily fixed, but needing further inspection to understand why the preprocess_function does not load the proper cached predict dataset.
Hi Daniele, sorry for the delay in the response!
You understood correctly the removal of duplicate inputs.
Regarding the warning+error you got, I found the issue:
Version 1.17.0 of datasets
did some modifications of dill
that worked for version 0.3.4 but failed with newer versions, which made the cache fingerprinting to fail and as a result caused issues with the cache.
The fix would be to explicitly install the following dependencies:
dill==0.3.4
multiprocess==0.70.12.2 # (newer versions require dill>0.3.4)
I've also updated the repository accordingly. Thank you for bringing this to our attention!
Thanks for the help!
Hello,
I'm trying to replicate the fine-tuning results for the Qasper dataset baseline and the 256-bart model.
I see two issues when I try to generate predictions:
This is the command I'm using:
Could you please advise? Thanks!