yyoshiaki / VIRTUS2

A bioinformatics pipeline for viral transcriptome detection and quantification considering splicing.
Other
16 stars 6 forks source link

How can I run createindex using a local custom local virus database? #19

Closed yyoshiaki closed 1 year ago

yyoshiaki commented 1 year ago
          Hi, thank you for your reply.

How can I run createindex using a local custom local virus database?

All best Patrick.

Originally posted by @patrick-douglas in https://github.com/yyoshiaki/VIRTUS2/issues/17#issuecomment-1533037739

yyoshiaki commented 1 year ago

In V2.0.2, createindex_localref.cwl was added.

https://github.com/yyoshiaki/VIRTUS2/blob/v2.0.2/workflow/createindex_localref.cwl

patrick-douglas commented 1 year ago

Hi, thank you for your reply, I found a different solution, but it works, I've created a local ftp server and provided the virus database ftp URL into createindex.job.yaml file. However, I'm getting the following error, due to the large custom virus database (150GB fasta file):

...
May 04 15:14:51 ..... started STAR run
May 04 15:14:51 ... starting to generate Genome files
terminate called after throwing an instance of 'std::bad_alloc'
  what():  std::bad_alloc
INFO [job star_index_virus] Max memory used: 13595MiB
WARNING [job star_index_virus] exited with status: 139
WARNING [job star_index_virus] completed permanentFail
WARNING [step star_index_virus] completed permanentFail
INFO [workflow ] completed permanentFail
...

After a search I found that I need to increase the STAR parameter to limitGenomeGenerateRAM: 104454248032 (~100GB), however I don't know how to pass this parameter through VIRTUS create index script. I have 120GB RAM available on my server. Any suggestion?

All Best Patrick.

yyoshiaki commented 1 year ago

Hi, the solution by the local ftp sounds nice!

And what a huge reference! I noticed that createindex.cwl is a very simple step, so it may be simpler to make a reference using STAR itself rather than modifying the cwl file.

mkdir STAR_index_customvirus
STAR --runMode genomeGenerate \
     --genomeDir STAR_index_customvirus \
     --genomeFastaFiles genome.fa \
     --genomeSAindexNbases 14 \
     --runThreadN 10 \
     --limitGenomeGenerateRAM 104454248032
patrick-douglas commented 1 year ago

Hi, After some tries, I've noticed that the required RAM is too much for my PC, so I'm trying to split my virus fasta file into multiple parts and run all of them separated, do you think run this way is "okay" or this may will cause any bias during detection?

All best Patrick.

yyoshiaki commented 1 year ago

Sorry, I don't have any information about this situation. Could you discuss in issues of the STAR repo? Also, I've not assumed such a huge reference. Therefore, I'm not sure whether using STAR to align viral reads against the huge reference is suitable...