sreeramkannan / Shannon

RNA-Seq
24 stars 13 forks source link

another ?gpmetis issue #7

Closed macmanes closed 7 years ago

macmanes commented 8 years ago

Command:

python /share/Shannon/shannon.py \
-o /mnt/data3/macmanes/error/shannon/shannon1M \
--left $RAID/error/bfc1M/bfc31.corr.fq.1 \
--right$RAID/error/bfc1M/bfc31.corr.fq.2 -p 60
--------------------------------------------
Shannon: RNA Seq de novo Assembly
--------------------------------------------
Checking the various dependencies
--------------------------------------------
Using jellyfish in /share/quorum/bin/jellyfish
Using GPMETIS in /usr/bin/gpmetis
--------------------------------------------
Thu Feb 11 06:18:26 2016: Starting Shannon run..
Using GNU Parallel in /usr/bin/parallel
Thu Feb 11 06:18:26 2016: Starting Jellyfish to extract Kmers from Reads..
Thu Feb 11 06:18:47 2016: Jellyfish finished..
Thu Feb 11 06:18:48 2016: Starting Kmer error correction..
Thu Feb 11 06:19:26 2016: Loading, processed 1000000
...
Thu Feb 11 07:05:43 2016: Correction, processed 34000000
Thu Feb 11 07:05:51 2016: 34261032 K-mers remaining after error correction.
Thu Feb 11 07:10:03 2016: Creating Kmers from (K+1)-mers..
Thu Feb 11 07:11:21 2016: 34261032 K+1-mers loaded.
Thu Feb 11 07:12:39 2016: 34331392 K-mers produced.
gpmetis -ufactor=1000 /mnt/data3/macmanes/error/shannon/shannon1M/component1.txt 41
******************************************************************************
METIS 5.0 Copyright 1998-13, Regents of the University of Minnesota
 (HEAD: , Built on: Jul 20 2013, 23:18:43)
 size of idx_t: 32bits, real_t: 32bits, idx_t *: 64bits
Graph Information -----------------------------------------------------------
 Name: /mnt/data3/macmanes/error/shannon/shannon1M/component1.txt, #Vertices: 20282, #Edges: 34658, #Parts: 41

Options ---------------------------------------------------------------------
 ptype=kway, objtype=cut, ctype=shem, rtype=greedy, iptype=metisrb
 dbglvl=0, ufactor=2.000, no2hop=NO, minconn=NO, contig=NO, nooutput=NO
 seed=-1, niter=10, ncuts=1

Direct k-way Partitioning ---------------------------------------------------
 - Edgecut: 4295, communication volume: 5769.

 - Balance:
     constraint #0:  1.999 out of 0.002

 - Most overweight partition:
     pid: 23, actual: 989, desired: 494, ratio: 2.00.

 - Subdomain connectivity: max: 33, min: 3, avg: 21.07

 - There are 25 non-contiguous partitions.
   Total components after removing the cut edges: 86,
   max components: 8 for pid: 24.

Timing Information ----------------------------------------------------------
  I/O:                             0.011 sec
  Partitioning:                    0.071 sec   (METIS time)
  Reporting:                       0.004 sec

Memory Information ----------------------------------------------------------
  Max memory used:                 2.247 MB
******************************************************************************
Thu Feb 11 07:17:37 2016: gpmetis for partitioning 1 is complete -- Component 0: 41 partitions,
Thu Feb 11 07:19:34 2016: k1mers2component dictionary created
Traceback (most recent call last):
  File "/share/Shannon/shannon.py", line 289, in <module>
    [components_broken, new_components] = kmers_for_component(kmer_directory, reads_files, base_directory_name, r1_contig_file_extension, r1_new_kmer_tag, r1_graph_file_extension, get_og_comp_kmers, get_partition_kmers, double_stranded, paired_end, False, partition_size, overload, K, gpmetis_path)
  File "/share/Shannon/kmers_for_component.py", line 261, in kmers_for_component
    reversed_read1_name=read_line1.split()[0]+'_reversed'+'\t'+'\t'.join(read_line1.split()[1:])
IndexError: list index out of range
blahah commented 8 years ago

looks unrelated to gpmetis to me - more like the read name is being parsed with the assumption that it will have multiple space-separated components, for this particular read that's not true.

sreeramkannan commented 8 years ago

Thanks Matt and Richard for trying out Shannon.

The issue is due to an implicit (and wrong) way by which Shannon determines file type - based on whether the filename ends with .fq or .fastq - since your filename is $RAID/error/bfc1M/bfc31.corr.fq.1 it determines the file type as FASTA which is clearly wrong and it fails when running kmers_for_component read assignment which is the first time Shannon deals with reads directly.

I will update Shannon to deal with this issue.