rnajena / bertax_training

Training scripts for BERTax
8 stars 4 forks source link

About finaldataset #14

Closed yongrenr closed 4 months ago

yongrenr commented 5 months ago

Hello! I'm very interested in your work! I would like to try your model's performance on the final dataset. I used final_model_dataset directly and processed them into json and txt files.But he always asks me to pass in virus.dataset,what should i do? Thank u!!!!

Use Bash Code: python -m models.bert_nc_finetune bert_nc_C2_final.h5 /home/dataset/final_model_dataset \ --multi_tax \ --epochs 15 \ --batch_size 128 \ --save_name _small_trainingset_filtered_fix_classes_selection \ --store_predictions \ --nr_seqs 1000000000

Traceback (most recent call last): File "/root/miniconda3/envs/bertax_37/lib/python3.7/runpy.py", line 193, in _run_module_as_main "main", mod_spec) File "/root/miniconda3/envs/bertax_37/lib/python3.7/runpy.py", line 85, in _run_code exec(code, run_globals) File "/home/bertax_training-master/models/bert_nc_finetune.py", line 320, in x, y, y_species = load_fragments(args.fragments_dir, balance=False, nr_seqs=args.nr_seqs) File "/home/bertax_training-master/models/bert_nc_finetune.py", line 44, in load_fragments fragmentsdir, f'{class}_fragments.json'))))) FileNotFoundError: [Errno 2] No such file or directory: '/home/dataset/final_model_dataset/Viruses_fragments.json'

yongrenr commented 4 months ago

Do you have a final classification model? And how do I run him? We look forward to hearing from you!!!!!!!

f-kretschmer commented 4 months ago

How exactly does your dataset directory (/home/dataset/final_model_dataset) look? For the bert_nc_finetune-script, the directory structure should be organized as described here. However you can of course adapt the function load_fragments if you need a different structure. But if, for example, you just don't want a specific class (like Viruses) you can simply change the classes variable, either directly in the bert_nc_finetune-script or in models.model.PARAMS['classes']

The final (pretrained and finetuned) BERTax model can be found here with detailed instructions on how to run it.

yongrenr commented 4 months ago

Hi!I studied your work again. I want to use the "big_trainingset_all_fix_classes_selection.h5" model on the “test data” pair to see how well the test classification works. But there's a problem. What do I need to do? My dataset directory (/home/dataset/final_model_dataset) look this: image image image image

2.I seem to have found the model that will eventually be used to test classification ”big_trainingset_all_fix_classes_selection.h5“ Is that right?

Thank you very much!!!!

f-kretschmer commented 4 months ago

I'm not sure if I understand what you want to do: If you want to run the model, don't use the finetune-script but instead the scripts of the "bertax" repository. If you want to finetune (so train) the model, your data has to be organized in the way I described. Your "test" directory is organized in the wrong way, it should look like the "final" directory, meaning each class ("final" does not make sense here) hast to have its sequences in the json file (e.g., Viruses_fragments.json for the class Viruses) and the taxonomic species IDs in the _species_picked.txt file (for Viruses: Viruses_species_picked.txt).

yongrenr commented 4 months ago

I'm not sure if I understand what you want to do: If you want to run the model, don't use the finetune-script but instead the scripts of the "bertax" repository. If you want to finetune (so train) the model, your data has to be organized in the way I described. Your "test" directory is organized in the wrong way, it should look like the "final" directory, meaning each class ("final" does not make sense here) hast to have its sequences in the json file (e.g., for the class Viruses) and the taxonomic species IDs in the file (for Viruses: ).Viruses_fragments.json``_species_picked.txt``Viruses_species_picked.txt

Thank u!!!!