tseemann / nullarbor

:floppy_disk: :page_with_curl: "Reads to report" for public health and clinical microbiology
GNU General Public License v2.0
134 stars 37 forks source link

final report not created - error core.aln #234

Closed chrstraub closed 4 years ago

chrstraub commented 4 years ago

Hi Torsten,

I ran nullarbor2 (conda install) on 165 isolates, it seemingly runs fine, but doesn't create that final report (which is so handy). I had a look at the log, but I can't find a very obvious error message where it might fail.

The last few lines from the log are:

[user@bio2 nullarbor_2309]$ tail -50 nullarbor.log
Cluster with MCL
2019/09/23 18:50:21 Running command: pan_genome_post_analysis -o clustered_proteins -p pan_genome.fa -s gene_presence_absence.csv -c _clustered.clstr  -i /NGS/active/ARLNIL/NGO/analysis/Ng2018_survey/nullarbor_2309/roary/qFHj16iQqU//_gff_files -f /NGS/active/ARLNIL/NGO/analysis/Ng2018_survey/nullarbor_2309/roary/qFHj16iQqU//_fasta_files -t 11  --dont_create_rplots   -v   -j Local --processors 4 --group_limit 50000 -cd 99
Use of uninitialized value in require at /opt/admin/bioinf/anaconda3/anaconda3-5.0.0.1/envs/nullarbor2/lib/site_perl/5.26.2/x86_64-linux-thread-multi/Encode.pm line 61.
2019/09/23 18:50:23 Reinflate clusters
2019/09/23 18:50:24 Split groups with paralogs
2019/09/23 18:50:47 Labelling the groups
2019/09/23 18:50:47 Transfering the annotation to the groups
2019/09/23 18:51:31 Creating accessory binary gene presence and absence fasta
2019/09/23 18:51:43 Creating accessory binary gene presence and absence tree
2019/09/23 18:51:43 Running command: /opt/bioinf/anaconda3/anaconda3-5.0.0.1/envs/nullarbor2/bin/FastTree -fastest -nt accessory_binary_genes.fa > accessory_binary_genes.fa.newick
FastTree Version 2.1.10 Double precision (No SSE3)
Alignment: accessory_binary_genes.fa
Nucleotide distances: Jukes-Cantor Joins: balanced Support: SH-like 1000
Search: Fastest+2nd +NNI +SPR (2 rounds range 10) +ML-NNI opt-each=1
TopHits: 1.00*sqrtN close=default refresh=0.50
ML Model: Jukes-Cantor, CAT approximation with 20 rate categories
Initial topology in 0.13 seconds
Refining topology: 29 rounds ME-NNIs, 2 rounds ME-SPRs, 15 rounds ML-NNIs
      0.13 seconds: ME NNI round 1 of 29, 1 of 163 splits
      0.37 seconds: SPR round   1 of   2, 101 of 328 nodes
      0.54 seconds: SPR round   1 of   2, 201 of 328 nodes
      0.68 seconds: SPR round   1 of   2, 301 of 328 nodes
      0.85 seconds: SPR round   2 of   2, 101 of 328 nodes
      0.99 seconds: SPR round   2 of   2, 201 of 328 nodes
      1.10 seconds: SPR round   2 of   2, 301 of 328 nodes
Total branch-length 10.529 after 1.16 sec
      1.23 seconds: ML NNI round 1 of 15, 1 of 163 splits
      1.39 seconds: ML NNI round 1 of 15, 101 of 163 splits, 13 changes (max delta 18.059)
ML-NNI round 1: LogLk = -50309.341 NNIs 29 max delta 18.08 Time 1.50
      1.49 seconds: Site likelihoods with rate category 1 of 20
Switched to using 20 rate categories (CAT approximation)
Rate categories were divided by 0.951 so that average rate = 1.0
CAT-based log-likelihoods may not be comparable across runs
Use -gamma for approximate but comparable Gamma(20) log-likelihoods
      1.68 seconds: ML NNI round 2 of 15, 101 of 163 splits, 8 changes (max delta 2.751)
ML-NNI round 2: LogLk = -47501.905 NNIs 11 max delta 2.75 Time 1.74
      1.82 seconds: ML NNI round 3 of 15, 101 of 163 splits, 3 changes (max delta 9.218)
ML-NNI round 3: LogLk = -47487.454 NNIs 3 max delta 9.22 Time 1.84
ML-NNI round 4: LogLk = -47480.782 NNIs 2 max delta 3.36 Time 1.89
ML-NNI round 5: LogLk = -47480.558 NNIs 1 max delta 0.14 Time 1.92
ML-NNI round 6: LogLk = -47480.547 NNIs 0 max delta 0.00 Time 1.94
Turning off heuristics for final round of ML NNIs (converged)
      1.94 seconds: ML NNI round 7 of 15, 1 of 163 splits
      2.11 seconds: ML NNI round 7 of 15, 101 of 163 splits, 2 changes (max delta 3.636)
ML-NNI round 7: LogLk = -47458.861 NNIs 3 max delta 6.92 Time 2.23 (final)
      2.22 seconds: ML Lengths 1 of 163 splits
Optimize all lengths: LogLk = -47458.046 Time 2.29
      2.51 seconds: ML split tests for    100 of    162 internal splits
Total time: 2.66 seconds Unique: 165/165 Bad splits: 1/162 Worst delta-LogLk 2.164
make: Leaving directory `/NGS/active/ARLNIL/NGO/analysis/Ng2018_survey/nullarbor_2309'

After running it, I have the isolate folders + the following files + roary folder:

-rw-rw---- 1 user SEC_lab_ARLNIL     26872 Sep 23 17:58 centrifuge_report.tsv
-rw-rw---- 1 user SEC_lab_ARLNIL 363512601 Sep 23 18:41 core.full.aln
-rw-rw---- 1 user SEC_lab_ARLNIL   2189892 Sep 23 18:41 core.ref.fa
-rw-rw---- 1 user SEC_lab_ARLNIL      2151 Sep 23 18:41 core.tab
-rw-rw---- 1 user SEC_lab_ARLNIL      8610 Sep 23 18:41 core.txt
-rw-rw---- 1 user SEC_lab_ARLNIL      4562 Sep 23 18:41 core.vcf
-rw-rw---- 1 user SEC_lab_ARLNIL     11956 Sep 23 18:41 denovo.tab
-rw-rw---- 1 user SEC_lab_ARLNIL      2139 Sep 23 13:07 isolates.txt
-rw-rw---- 1 user SEC_lab_ARLNIL    140944 Sep 23 13:07 Makefile
-rw-rw---- 1 user SEC_lab_ARLNIL     16930 Sep 23 18:44 mlst.tab
-rw-rw---- 1 user SEC_lab_ARLNIL  40579262 Sep 23 18:52 nullarbor.log
-rw-rw---- 1 user SEC_lab_ARLNIL   2189892 Sep 23 13:05 ref.fa
-rw-rw---- 1 user SEC_lab_ARLNIL        29 Sep 23 18:39 ref.fa.fai
drwxrws--- 2 user SEC_lab_ARLNIL      4096 Sep 23 18:52 roary

So it looks like its going through all of the analysis, but not creating that final report? Any idea why?

I tried make report, but that doesn't work: make: *** No rule to make targetreport'. Stop.` (although I read somewhere that option was from nullarbor1).

Thanks, Christina

chrstraub commented 4 years ago

I just tried make publish

and that fails with

snippy-core --ref ref.fa 19AR0026_S53 19AR0052_S68 18AR1432_S35 19AR0012_S50 18AR1340_S53 18AR1546_S32 18AR1379_S607 18AR1466_S47 18AR1199_S25 19AR0038_S63 18GQA02_S7 18AR1288_S43 19AR0041_S65 18AR1363_S54 18AR1559_S63 19AR0032_S524 18AR1323_S45 18AR1477_S51 18GQA04_S8 18AR1543_S26 18AR1527_S25 18AR1484_S57 19AR0005_S45 18AR1380_S61 18AR1433_SS35 18AR1452_S45 19AR0011_S49 18AR1444_S41 18AR1547_S33 18AR1333_S50 19AR0025_S52 19AR0068_S84 18AR1453_S46 18AR13838_S31 19AR0070_S86 18AR1558_S62 18AR1324_S46 18AR1553_S39 18AR1263_S34 18AR1545_S31 18AR1555_S41 18AR1322_S44 18ARR0007_S31 19AR0050_S67 ATCC49226_S11 18AR1550_S36 18AR1425_S29 18AR1561_S42 18AR1474_S48 18AR1548_S34 18AR1237_S30  19AR0004_S44 18AR1561_S30 19AR0034_S61 18AR1544_S27 18AR1404_S69 19AR0035_S62 19AR0013_S51 19AR0008_S48 18AR1330_SS39 18AR1279_S38 18AR1552_S38 18AR1427_S31 18GQA06_S10 18AR1284_S41 18AR1331_S48 18AR1277_S37 18AR1374_S55 18AR13361_S55 18GQA01_S6 18AR1276_S36 18AR1217_S28 18AR1428_S32 18AR1389_S66 18AR1429_S33 18AR1513_S22 18AR1283_S40 18AR137443_S40 18AR1441_S38 19AR0031_S32 18AR1406_S27 18AR1375_S56 NCTC18848_S13 19AR0006_S46 19AR0058_S74 18AR1390_S67 188AR1542_S25 19AR0065_S81 19AR0064_S80 18AR1332_S49 19AR0059_S75 18AR1252_S33 18AR1544_S30 18AR1483_S56 18AR1541_S2719AR0040_S64 19AR0062_S78 18AR1445_S42 18AR1550_S29
Unknown option: ref
Synopsis:
  Combine multiple Snippy folders into a core SNP alignment
Usage:
  /usr/bin/snippy-core [options] [--noref] snippyDir1/ snippyDir2/ snippyDir3/ ...
Options:
  --help!         This help.
  --quiet!        No output to stderr (default '0').
  --verbose!      Verbose output (default '0').
  --inprefix=s    Preferred input prefix of Snippy files (default 'snps').
  --prefix=s      Output file prefix (default 'core').
  --noref!        Exclude reference (default '0').
  --aformat=s     Output alignment format: nexus fasta phylip maf clustalw ... (default 'fasta').
make: *** [core.aln] Error 1
chrstraub commented 4 years ago

sorry for multiple comments here, but troubleshooting along the way - so I looked for 'error' in the log and hadn't found anything, but I hadn't looked for ERROR:

[user@bio2 nullarbor_2309]$ grep -B 10 -A 10 'ERROR' nullarbor.log
Running: snp-sites -c -o core.aln core.full.aln
[18:41:41] Found exact allele match neisseria.adk-39
[18:41:41] Found exact allele match neisseria.pgm-65
[18:41:41] Found exact allele match neisseria.abcZ-59
[18:41:41] Found exact allele match neisseria.gdh-150
[18:41:41] Found exact allele match neisseria.pdhC-153
[18:41:41] Found exact allele match neisseria.aroE-67
[18:41:41] Found exact allele match neisseria.fumC-111
[18:41:42] Scanning: 18AR1484_S57/contigs.fa [18AR1484_S57/contigs.fa]
Warning: No SNPs were detected so there is nothing to output.
ERROR: Could not run: snp-sites -c -o core.aln core.full.aln
make: *** [core.aln] Error 2
make: *** Waiting for unfinished jobs....
[18:41:47] Found exact allele match neisseria.adk-39
[18:41:47] Found exact allele match neisseria.pgm-133
[18:41:47] Found exact allele match neisseria.abcZ-59
[18:41:47] Found exact allele match neisseria.gdh-149
[18:41:47] Found exact allele match neisseria.pdhC-153
[18:41:47] Found exact allele match neisseria.aroE-67
[18:41:47] Found exact allele match neisseria.fumC-156
[18:41:47] Scanning: 19AR0005_S45/contigs.fa [19AR0005_S45/contigs.fa]

which relates to the error message of make publish. so that appears to be the problem. I still don't know how to fix it though, so any help would be much appreciated!

andersgs commented 4 years ago

How diverse is your sample? I suspect something is getting in the way of the core genome. If your sample is too diverse, it is possible there is no core genome.

Could you try running make preview first to see how your samples are roughly related, and check that there are no overly contaminated samples? The README has instructions on make preview

Best of luck.

chrstraub commented 4 years ago

Yes that was the problem! There were a few samples that were a bit too contaminated, so removing those solved the issue and it created the final report fine. I guess I should have known that was the problem from looking at the roary output before... Thanks for pointing it out.

tseemann commented 4 years ago

@chrstraub in future, run make preview as the first step. this will take only 1 minute and create a MashTree to inspect for outliers.

https://github.com/tseemann/nullarbor#advanced-usage