phac-nml / irida

Canada’s Integrated Rapid Infectious Disease Analysis Platform for Genomic Epidemiology
https://irida.ca
Apache License 2.0
40 stars 31 forks source link

Question about analysis submissions #52

Closed mauddelagarde closed 6 years ago

mauddelagarde commented 6 years ago

I submitted several analysis (assembly and annotation) on several fastq (e coli genome) almost two weeks ago, and it is not yet finalized. There is no error messages, the state is "running", but it does not seem to go further. For how long should I wait? What can be the problem?

apetkau commented 6 years ago

Thanks for reporting the issue. We are looking into it.

glwinsor commented 6 years ago

Hi, I manage the SFU instance of IRIDA and was away from my office for an extended period. The cluster that SFU IRIDA uses to run jobs had a performance problem around July 28th which caused a lot of jobs to become backed up. It looks like your jobs have now completed - except for a couple of the SNVPhyl analyses which failed (I'll look into this and get back).

mauddelagarde commented 6 years ago

Thank you for this answer I will check the results.

Maud


De : Geoff Winsor [notifications@github.com] Envoyé : dimanche 19 août 2018 19:43 À : phac-nml/irida Cc : de Lagarde Maud; Author Objet : Re: [phac-nml/irida] Question about analysis submissions (#52)

Hi, I manage the SFU instance of IRIDA and was away from my office for an extended period. The cluster that SFU IRIDA uses to run jobs had a performance problem around July 28th which caused a lot of jobs to become backed up. It looks like your jobs have now completed - except for a couple of the SNVPhyl analyses which failed (I'll look into this and get back).

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/phac-nml/irida/issues/52#issuecomment-414164454, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AmfQuRSNxv9eZyuwzt-qHcqnf1UbZgMoks5uSfgigaJpZM4Vu9vc.

glwinsor commented 6 years ago

Hi Maud,

I think I found the cause of the problem you are experiencing with SNVPHyl.

I noticed that one key step in the analysis was timing out after 8 hours, even though it usually completes in around 10 minutes. This led me to suspect there was something up with your reference.

I checked and it appears you uploaded the GCF_000831715.1_ASM83171v1_cds_from_genomic.fna.gz file containing nucleotide sequences for all of the CDS features while you need to upload the GCF_000831715.1_ASM83171v1_genomic.fna file containing the genomic DNA for all replicons.

Please give this a try and see if it helps with your analysis. I notice that you are using a draft genome for the reference. Are there any complete genomes that would be appropriate for this data set?

Geoff

apetkau commented 6 years ago

Thanks for looking into this @glwinsor. Note, for SNVPhyl a draft genome is fine so long as it's the complete genomic sequence (and not each CDS as a separate sequence).

mauddelagarde commented 6 years ago

Thank you a lot for your answer glwinsor. I tried with a different reference file.

mauddelagarde commented 6 years ago

Hello I am really sorry to bother you again, I tried again to use SNVPHyl, I changed the reference file, first with a genomic.fna file and then with a fasta file of my own, but neither worked. Can you help me? The error code is No phylip formatted alignment found in /project/6004808/irida/galaxy-database/files/029/dataset_29632.dat

thank you for your time Maud

apetkau commented 6 years ago

It looks like one of the samples you are running through the pipeline (ECL22804) either did not upload correctly or has very little data since the read files are about 2 MB, whereas all other files are >100 MB. Could you try removing that sample from the set you run through SNVPhyl?

glwinsor commented 6 years ago

Thanks Aaron,

I missed that issue you spotted. I also came across a cryptic error log message stating that a file was truncated. Let’s keep our fingers crossed and hope this fixes the problem.

Geoff

On Aug 23, 2018, at 7:06 AM, Aaron Petkau notifications@github.com<mailto:notifications@github.com> wrote:

It looks like one of the samples you are running through the pipeline (ECL22804) either did not upload correctly or has very little data since the read files are about 2 MB, whereas all other files are >100 MB. Could you try removing that sample from the set you run through SNVPhyl?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/phac-nml/irida/issues/52#issuecomment-415426374, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AAyVH7R3Z9FHj8bM6tX4DHlk4CAlm2kAks5uTrb4gaJpZM4Vu9vc.

Geoff Winsor Lead Database Developer, Bioinformatics Brinkman Lab SFU Big Data Hub, Room 10924 Simon Fraser University, Burnaby, BC, Canada, V5A 1S6 Phone: 778-782-5097

mauddelagarde commented 6 years ago

Hello, I am sorry to bother you again, I have two questions

apetkau commented 6 years ago

Hello Maud,

  1. Do you mean you tried to download a file from NCBI (or some other resource) in fasta format and you want to upload this file to IRIDA as a reference genome but it's not working? If so, could you check for IUPAC ambiguity characters in the fasta file (e.g., Y for pyrimidine or R for a purine)? The only ambiguity character supported right now in a reference file is N.
  2. I'm not quite sure what you mean by downloading directly a file in fasta format from the samples? The output of the assembly pipeline (https://irida.corefacility.ca/documentation/user/tutorials/assembly/) should include both a genbank file (a *.gbk file) and a fasta file of the contigs (a *.fasta file) which you can download.
mauddelagarde commented 6 years ago

My purpuse would be to create a sample and to download fasta and not fastq in the sample, but when I do uplaod sequence, I can not dowload fasta from my computer. Is it normal?

apetkau commented 6 years ago

Okay, that makes sense. IRIDA only supports storing the sequence reads (fastq) with a sample. You can not transfer an assembled genome (fasta) to IRIDA to store along with a sample.

You can transfer the sequence reads (fastq) to a sample and have IRIDA do a genome assembly and link that assembly to the sample (https://irida.corefacility.ca/documentation/user/user/pipelines/#saving-pipeline-results-to-a-sample). But, you cannot upload your own assembled genome into IRIDA.