Open ProkopDivin opened 1 year ago
@ProkopDivin Thank you for the feedback. I expect there will be some debugging in the future since the repo was created recently, and I only tested the code on my local server. The code was the basis of my master’s thesis, and it should work, only some minor changes needs to be dealt with due to some lack of attention. I apologize for any inconvenience.
This error is related probably to the mounting of your local volume to the image.
When you run the image as a container, you should specify which folder inside the image that will be connected to your local storage.
I forgot to add an argument to interactive execution of the image
ch-run --bind /home/username/files:/app/output biopython bash
In the bash .sh scripts the problem shouldn’t be there (you need only to replace the source path)
Please let me know if it resolves the issue.
I will add more information about how to work with —bind
argument.`
I have made some changes to the instructions in interactive mode. Please have a look at them again
unfortunately it doesn`t
[divinpr@volta05 protein_embeddings]$ ls
a.001.001.001_1s69a_A.fa compute_embeddings_gpu.sh README.md
biopython compute_protein_embeddings.py requirements.txt
compute_embeddings_cpu.sh Dockerfile
[divinpr@volta05 protein_embeddings]$ ch-run --bind /home/divinpr/pbsprediction/protein_embeddings:/app/output biopython bash
ch-run[706080]: error: can't mkdir: /home/divinpr/pbsprediction/protein_embeddings/biopython/app/output: Read-only file system (ch_misc.c:409 30)
[divinpr@volta05 protein_embeddings]$ ls biopython/
app boot dev home lib64 mnt proc run srv tmp var
bin ch etc lib media opt root sbin sys usr
[divinpr@volta05 protein_embeddings]$ ls biopython/app/
a.001.001.001_1s69a_A.fa compute_protein_embeddings.py requirements.txt
compute_embeddings_cpu.sh Dockerfile
compute_embeddings_gpu.sh README.md
the output directory just can not be created
@ProkopDivin
Can you create the directory manually?
mkdir biopython/app/output
yes, but then, the python script canot make the outputfiles
You should see the outputs here /home/divinpr/pbsprediction/protein_embeddings
(source directory)
well the script end with the error so there isnt any filles made
[divinpr@volta05 protein_embeddings]$ mkdir biopython/app/output
[divinpr@volta05 protein_embeddings]$ ch-run --bind /home/divinpr/pbsprediction/protein_embeddings:/app/output
divinpr@volta05:/$ ls
app bin boot ch dev etc home lib lib64 media mnt opt proc root run sbin srv sys tmp usr var
divinpr@volta05:/$ python ./app/output/compute_protein_embeddings.py --emb_name bert --input_dataset a.001.001.
Import embedder...
Some weights of the model checkpoint at /home/divinpr/.cache/bio_embeddings/prottrans_bert_bfd/model_directory were not used when initializing BertModel: ['cls.predictions.decoder.weight', 'cls.predictions.bias', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.decoder.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Getting sequences from dataset ...
1 sequences found
Getting embeddings from bert
Traceback (most recent call last):
File "./app/output/compute_protein_embeddings.py", line 121, in <module>
with zipfile.ZipFile(f"{output_folder}/{dataset}_{emb_name}.zip","w") as thezip:
File "/usr/local/lib/python3.7/zipfile.py", line 1240, in __init__
self.fp = io.open(file, filemode)
FileNotFoundError: [Errno 2] No such file or directory: 'embeddings/a.001.001.001_1s69a_A_bert.zip'
You should specify a valid output folder as an argument.
python ./app/output/compute_protein_embeddings.py --emb_name bert --input_dataset ... --output_folder ~/pbsprediction/protein_embeddings
Or you can create a folder called embeddings
inside your source directory, and you should expect output data there.
Here you should run
python ./app/output/compute_protein_embeddings.py --emb_name bert --input_dataset ... --output_folder embeddings
this woked thank you
You're welcome.
when running this:
the problem i that argument named
input_dataset
has valuea.001.001.001_1s69a_A.fa
, but for some reason there is in the program added the/app/output/
. so the path to the input file which is pass to the script as parameter is changed. Im sure that it is mistake. The mistake will be probably in this line:input_dataset="/app/output/"+args.input_dataset
im not sure, if it is supoused to be about path to the output or there is some another intention.It also looks like there will be more bugs.
When I consider the previos mistake, it looks like this code couldn´t ever run properly. Didn`t you upload wrong version or something like this. Can you try it yourself and debug it.