meetU-MasterStudents / 2019---2020-partage

For exchanging material and doc
2 stars 3 forks source link

Psiblast contre uniref90 sur le cluster #17

Open florianecoulmance opened 5 years ago

florianecoulmance commented 5 years ago

Hello,

I have tried several times to psiblast 1 query against uniref90 in the cluster.

-1st I perform :

makeblastdb -in /shared/bank/uniref90/current/flat/uniref90.fasta -dbtype prot -out /shared/projects/meetu2019/fgayrard//Data/Uniref90/uniref90.fa

This gives me several files like this

uniref90.fa.00.phr uniref90.fa.01.psq uniref90.fa.03.pin uniref90.fa.05.phr uniref90.fa.06.psq uniref90.fa.08.pin uniref90.fa.00.pin uniref90.fa.02.phr uniref90.fa.03.psq uniref90.fa.05.pin uniref90.fa.07.phr uniref90.fa.08.psq uniref90.fa.00.psq uniref90.fa.02.pin uniref90.fa.04.phr uniref90.fa.05.psq uniref90.fa.07.pin uniref90.fa.01.phr uniref90.fa.02.psq uniref90.fa.04.pin

etc ...

I do this step as indicated by the documentation and because I cannot psiblast directly against the uniref90.fasta file in /shared/bank/uniref90/current/flat/uniref90.fasta (which seems to be a common problem when I looked on the internet)

psiblast -query /shared/projects/meetu2019/fgayrard/query1.fasta -db /shared/projects/meetu2019/fgayrard/Data/Uniref90/uniref90.fa -pseudocount 1 -num_iterations 3 -out /shared/projects/meetu2019/fgayrard/1queryMSA.psiblast

the error is an alias error

I have tried before to make the database with the uniref90.fsa file in shared/bank/uniref90/current/fasta/uniref90.fsa and to psiblast against all the files created (as I described above, but this does not seem to work).

Anyone has an idea ??

Thank you for your help,

Floriane

annelopes commented 5 years ago

Hi Floriane,

try to remove the ".fa" extension with sth like -db /shared/projects/meetu2019/fgayrard/Data/Uniref90/uniref90

Please tell us if it works.

A.

florianecoulmance commented 5 years ago

Hello Anne,

I tried this already but I still have this alias problem even when removing fa

What is weird is that it works perfectly on the small subset of uniref I dowloaded on my computer !

Floriane

annelopes commented 5 years ago

For me it works:

blastp -query /shared/projects/meetu2019/alopes/runtest/query1.fasta -db /shared/bank/uniref50/current/blast/uniref50

You have to provide the complete path where your db is stored + the rootname of the db (here uniref50).

Anyway, I don't understand why do you need to make your blast db with uniref50 since it already exists (in shared/bank/uniref50/current/blast/ - you have all the corresponding uniref50 files (.phr, .pin, *.psd etc) in this directory). This is precisely the purpose of these bank dir. So you don't have to provide a fasta file but rather the path to the dir containing the db encoded into .phr, .psd etc files (for uniref, again, this dir already exists and is stored in /shared/bank/).

That said, uniref50 is big (about 18G) and each node memory is limited to 2G. So to run your blast on uniref50, you must add the following command at the beginning of your script:

SBATCH --mem 20GB

Good luck,

Anne

florianecoulmance commented 5 years ago

Thank you Anne,

I was trying to do it with the uniref90 not uniref50 for which there is no formatted db in blast.

A week ago I did -makedb on the .fsa file of uniref90, it worked and then the psiblast seemed to work. However, it did an error due to memory problem which I attempted to solve with #SBATCH --mem parameter. Again, I had memory problem. Is uniref90 just too big for psiblast to run on the cluster ?

I will try with the uniref50 and let you know the outcome !

Thank you for your help,

Floriane

annelopes commented 5 years ago

indeed, you're right, for uniref90, you have to create the db with makeblastdb. Depending on the size (du -h dir_uniref90/ ) you have to adapt the memory you need with the flag #SBATCH --mem XXXGB. But don't think it is a good idea on the cluster since you won't have enough space to store it in your dir. So better to run on uniref50 or uniprot for instance.

annelopes commented 5 years ago

I ask to IFB whether they can put the uniref90. Keep in touch.

florianecoulmance commented 5 years ago

The formatted uniref90 dataset is in my directory /shared/projects/meetu2019/fgayrard/Data/Uniref90 and its size is 49GB.

Maybe now that I have it in my own directory, we can just copy paste it in the shared/bank/uniref90/current/blast folder ?

annelopes commented 5 years ago

Otherwise, you can use the old version of uniref90 (2018), results will be more or less the same! (but it will be much more expensive in terms of computational time)

/dhared/bank/uniref90/uniref90_2018-10-10/

annelopes commented 5 years ago

No, you can't write in shared/bank. Please use their old version.

florianecoulmance commented 5 years ago

Ok great ! I will let you know the outcome.

Thank you very much for the advices !