shendurelab / LACHESIS

The LACHESIS software, as described in Nature Biotechnology (http://dx.doi.org/10.1038/nbt.2727)
Other
76 stars 32 forks source link

How to get chromosome bins #53

Open mictadlo opened 5 years ago

mictadlo commented 5 years ago

Hi, Running LACHESIS in the below way did not provide the expected chromosome numbers because I got 115 groups.

/usr/local/bin/Lachesis lachesis.ini
/LACHESIS/src/bin/CreateScaffoldedFasta.pl QMg_NbQ4P_RN.fasta lachesis

cat lachesis/REPORT.txt provided:

SPECIES = plant
OUTPUT_DIR = lachesis
DRAFT_ASSEMBLY_FASTA = QMg_NbQ4P_RN.fasta
SAM_DIR = /QRISdata/Q0231/lachesis
SAM_FILES = N_Ben_HiC2_rep1.bam N_Ben_HiC4_rep1.bam
RE_SITE_SEQ = GATC
USE_REFERENCE = 0
SIM_BIN_SIZE = 0
REF_ASSEMBLY_FASTA = test_case/hg19/Homo_sapiens_assembly19.fasta
BLAST_FILE_HEAD = test_case/draft_assembly/assembly
DO_CLUSTERING = 1
DO_ORDERING   = 1
DO_REPORTING  = 1
OVERWRITE_GLM = 0
OVERWRITE_CLMS = 0
CLUSTER_N = 19
CLUSTER_CONTIGS_WITH_CENS = -1
CLUSTER_MIN_RE_SITES = 25
CLUSTER_MAX_LINK_DENSITY = 2
CLUSTER_NONINFORMATIVE_RATIO = 3
CLUSTER_DRAW_HEATMAP = 1
CLUSTER_DRAW_DOTPLOT = 1
ORDER_MIN_N_RES_IN_TRUNK = 15
ORDER_MIN_N_RES_IN_SHREDS = 15
ORDER_DRAW_DOTPLOTS = 1
REPORT_EXCLUDED_GROUPS = -1
REPORT_QUALITY_FILTER = 1
REPORT_DRAW_HEATMAP = 1

ReportChart!

Info about input assembly:
DE NOVO ASSEMBLY, with no reference genome (less validation available)
Species: benth
N contigs:      1512            Total length:   2774612304              N50:    4284592
N clusters (derived):   115
N non-singleton clusters:       22
N orderings found:      115

############################
#                          #
#    CLUSTERING METRICS    #
#                          #
############################

Number of contigs in clusters:  1495            (98.88% of all contigs)
Length of contigs in clusters:  2773873324      (99.97% of all sequence length)

+----------+-----------+-------------+
|  CLUSTER | NUMBER OF |  LENGTH OF  |
|  NUMBER  |  CONTIGS  |   CONTIGS   | 
+----------+-----------+-------------+
|      0   |     114   |   285238080 |
|      1   |      97   |   232421157 |
|      2   |     117   |   197340285 |
|      3   |      84   |   187516710 |
|      4   |      80   |   179402476 |
|      5   |      89   |   165376221 |
|      6   |      65   |   157315626 |
|      7   |      80   |   151833938 |
|      8   |      80   |   148910574 |
|      9   |      79   |   140080377 |
|     10   |      88   |   137055451 |
|     11   |      65   |   135577112 |
|     12   |      60   |   133912412 |
|     13   |      70   |   117818930 |
|     14   |      65   |   116531146 |
|     15   |      63   |   102263122 |
|     16   |      28   |    93089711 |
|     17   |      48   |    87456991 |
|     18   |      15   |      964930 |
|     19   |       6   |      294111 |
|     20   |       7   |      283069 |
|     21   |       1   |      239832 |
|     22   |       1   |      145336 |
|     23   |       1   |      104136 |
|     24   |       1   |      101472 |
|     25   |       1   |       94178 |
|     26   |       1   |       77308 |
|     27   |       1   |       67648 |
|     28   |       1   |       67087 |
|     29   |       1   |       64664 |
|     30   |       1   |       59313 |
|     31   |       1   |       59081 |
|     32   |       1   |       57897 |
|     33   |       1   |       53810 |
|     34   |       1   |       50546 |
|     35   |       1   |       49583 |
|     36   |       2   |       48675 |
|     37   |       1   |       48060 |
|     38   |       1   |       44526 |
|     39   |       1   |       39160 |
|     40   |       1   |       37315 |
|     41   |       1   |       35095 |
|     42   |       1   |       32532 |
|     43   |       1   |       29921 |
|     44   |       1   |       28202 |
|     45   |       1   |       26998 |
|     46   |       1   |       26886 |
|     47   |       1   |       26813 |
|     48   |       1   |       26698 |
|     49   |       1   |       26687 |
|     50   |       1   |       26517 |
|     51   |       1   |       26501 |
|     52   |       1   |       26414 |
|     53   |       1   |       26363 |
|     54   |       1   |       26348 |
|     55   |       1   |       26272 |
|     56   |       1   |       26153 |
|     57   |       1   |       26101 |
|     58   |       1   |       26099 |
|     59   |       1   |       26012 |
|     60   |       1   |       25913 |
|     61   |       1   |       25836 |
|     62   |       1   |       25798 |
|     63   |       1   |       25728 |
|     64   |       1   |       25694 |
|     65   |       1   |       25584 |
|     66   |       1   |       25530 |
|     67   |       1   |       25343 |
|     68   |       1   |       25268 |
|     69   |       1   |       25212 |
|     70   |       1   |       25077 |
|     71   |       1   |       24936 |
|     72   |       1   |       24853 |
|     73   |       1   |       24700 |
|     74   |       1   |       24228 |
|     75   |       1   |       23985 |
|     76   |       1   |       23909 |
|     77   |       1   |       23321 |
|     79   |       1   |       23222 |
|     80   |       1   |       23141 |
|     81   |       1   |       23114 |
|     82   |       1   |       22951 |
|     83   |       1   |       22856 |
|     84   |       1   |       22373 |
|     85   |       1   |       22328 |
|     86   |       1   |       22169 |
|     87   |       1   |       20926 |
|     88   |       1   |       20183 |
|     89   |       1   |       19684 |
|     90   |       1   |       19675 |
|     91   |       1   |       19626 |
|     92   |       1   |       19153 |
|     93   |       1   |       18885 |
|     94   |       1   |       18838 |
|     95   |       1   |       18639 |
|     96   |       1   |       18249 |
|     97   |       1   |       18248 |
|     98   |       1   |       18233 |
|     99   |       1   |       18201 |
|    100   |       1   |       18200 |
|    101   |       1   |       18180 |
|    102   |       1   |       18142 |
|    103   |       1   |       17982 |
|    104   |       1   |       17787 |
|    105   |       1   |       17473 |
|    106   |       1   |       17401 |
|    107   |       1   |       17265 |
|    108   |       1   |       16586 |
|    109   |       1   |       16091 |
|    110   |       1   |       16056 |
|    111   |       1   |       15989 |
|    112   |       1   |       15859 |
|    113   |       1   |       15540 |
|    114   |       1   |       15213 |
+----------+-----------+-------------+
|   TOTAL  |    1495   |  2773873324 |
+----------+-----------+-------------+

############################
#                          #
#     ORDERING METRICS     #
#                          #
############################

Number of contigs in orderings: 0               (0% of all contigs in clusters, 0% of all contigs)
Length of contigs in orderings: 0       (0% of all length in clusters, 0% of all sequence length)
Number of contigs in trunks:    0               (-nan% of contigs in orderings)
Length of contigs in trunks:    0       (-nan% of length in orderings)

Fraction of contigs in orderings with high orientation quality: 0 (-nan%), with length 0 (-nan%)
Fraction of contigs in trunks    with high orientation quality: 0 (-nan%), with length 0 (-nan%)

How am I able to the expected 19 chromosomes?

Thank you in advance,

Michal

JingaJenga commented 5 years ago

Hi Michal,

Thanks for your e-mail, and for your interest in the LACHESIS software! The first thing I should mention is that LACHESIS is no longer being actively developed or maintained, as stated on the Github front page. I recommend you take a look at the Juicer software from the Aiden lab (https://github.com/theaidenlab), a more recently developed and actively maintained piece of code that serves roughly the same purpose. Also, if you want a research kit that will ensure high-quality Hi-C results, I suggest contacting the folks at Phase Genomics (https://phasegenomics.com/).

As for your concern about 19 chromosomes: As stated in the original paper, LACHESIS can predict roughly, but not precisely, the number of chromosomes in the assembly. Your assembly actually shows a pretty steep drop-off in size after the first 19 scaffolds (#0-#18). This suggests that LACHESIS has correctly picked up on intra-chromosomal signals; even in the absence of external information, you could have estimated roughly 19 chromosomes from the scaffold sizes. I suggest you interpret the 19 largest scaffolds as roughly equivalent to the 19 chromosomes, with some possible noisiness around the merge (cluster #18 in particular is borderline in size.) The other, smaller scaffolds are likely true chromosomal sequence that should have been merged into scaffolds #0-#18 but LACHESIS did not see a strong enough signal to make that merge. Note that the combined length of scaffolds #19-#114 is only 57 Mb.

-- Josh

mictadlo commented 5 years ago

Hi Josh, Thank you for your explanation. By any chance, do you know why none of the contigs have been ordered?

JingaJenga commented 5 years ago

I'm not sure. The clusters are pretty large, so there should be enough signal to order them. Either there is a severe lack of Hi-C link density, or some of your assembly files might have been created incompletely. Try setting OVERWRITE_CLMS = 1.

mictadlo commented 5 years ago

Hi Josh, I wish you a Happy New Year. Now, I created the BAM files with bwa mem -5SP [assembly.fasta] [fwd_hic.fastq] [rev_hic.fastq] | samblaster | samtools view -S -h -b -F 2316 > [aligned.bam] as recommended by phasegenomics. This has reduced the amount of clusters from 115 to 20.

ReportChart!

Info about input assembly:
DE NOVO ASSEMBLY, with no reference genome (less validation available)
Species: benth
N contigs:      1512            Total length:   2774612304              N50:    4284592
N clusters (derived):   20
N non-singleton clusters:       20
N orderings found:      20

############################
#                          #
#    CLUSTERING METRICS    #
#                          #
############################

Number of contigs in clusters:  1495            (98.88% of all contigs)
Length of contigs in clusters:  2773948172      (99.98% of all sequence length)

+----------+-----------+-------------+
|  CLUSTER | NUMBER OF |  LENGTH OF  |
|  NUMBER  |  CONTIGS  |   CONTIGS   | 
+----------+-----------+-------------+
|      0   |     207   |   304822244 |
|      1   |     104   |   251236598 |
|      2   |     103   |   215915806 |
|      3   |      85   |   185990618 |
|      4   |      96   |   185821186 |
|      5   |     137   |   169943199 |
|      6   |      79   |   169694706 |
|      7   |      87   |   160635652 |
|      8   |      80   |   155356232 |
|      9   |      80   |   128553045 |
|     10   |      59   |   121698875 |
|     11   |      53   |   120471892 |
|     12   |      62   |   114055062 |
|     13   |      45   |   105996889 |
|     14   |      53   |   105077856 |
|     15   |      57   |    88736847 |
|     16   |      44   |    76993531 |
|     17   |      30   |    68241346 |
|     18   |      28   |    44391277 |
|     19   |       6   |      315311 |
+----------+-----------+-------------+
|   TOTAL  |    1495   |  2773948172 |
+----------+-----------+-------------+

Unfortunately, they are not ordered and oriented:

Number of contigs in orderings: 0               (0% of all contigs in clusters, 0% of all contigs)
Length of contigs in orderings: 0       (0% of all length in clusters, 0% of all sequence length)
Number of contigs in trunks:    0               (-nan% of contigs in orderings)
Length of contigs in trunks:    0       (-nan% of length in orderings)

Fraction of contigs in orderings with high orientation quality: 0 (-nan%), with length 0 (-nan%)
Fraction of contigs in trunks    with high orientation quality: 0 (-nan%), with length 0 (-nan%)

I also tried OVERWRITE_CLMS = 1 without any success. Is it possible that this could be caused by the below files which were created outside the out folder?

-rw-r--r--  1 1032814217 root  24K Jan  2 03:39 QMg_NbQ4P_RN.fasta.counts_GATC.txt
-rw-r--r--  1 1032814217 root  17K Jan  2 04:13 QMg_NbQ4P_RN.fasta.names
-rw-r--r--  1 1032814217 root  102 Jan  2 05:46 heatmap.chrom_breaks.txt
-rw-r--r--  1 1032814217 root    6 Jan  2 05:46 heatmap.txt

Thank you in advance,

Michal

baozg commented 5 years ago

@mictadlo I have same trouble with ordering? Did you have solved it ?

jazberna1 commented 5 years ago

Hi,

I had the same issue, no contig ordering at all. I then found that my sam file was not ordered by read name.

Jorge