thegenemyers / MERQURY.FK

FastK based version of Merqury
Other
17 stars 3 forks source link

Tetraploidy hifi assembly for QV estimation #5

Open baozg opened 2 years ago

baozg commented 2 years ago

Hi, @thegenemyers

Does the Merqury have diploid assumption ? We have a phased chromosome-level autotetraploidy plant genome. Can I use the Merqury for QV estimation ? Do I need any other modification ?

Thanks, Zhigui Bao

baozg commented 2 years ago

I have try FASTK and MERQURY.FK, but the MERQURY.FK throw the Segmentation fault. Here is the code I use :

FASTK and MERQURY.FK were installed today.
# FASTK for PE250 reads
pigz -p 32 -dc WGS.R1.clean.fastq.gz > WGS.R1.clean.fastq
pigz -p 32 -dc WGS.R2.clean.fastq.gz > WGS.R2.clean.fastq
## FASTK
~/software/FASTK/bin/FastK -v -PWGS -T64 -M200 -k31 -t1 WGS.R1.clean.fastq WGS.R2.clean.fastq

## MERQURY.FK
~/software/MERQURY.FK/bin/MerquryFK -v -T64 -pdf WGS.R1.clean.ktab ./HiFi.V1.fa test

Two log

# FASTK

Partitioning 2 .fastq files into 64 parts

Determining minimizer scheme & partition for WGS.R1.clean
  Estimate 214.125G 31-mers
  Dividing data into 11 blocks
  Using 7-minimizers with 1048 core prefixes

Phase 1: Partitioning K-mers into 704 Super-mer Files

  There are 977,698,548 reads totalling 244,424,637,000 bps

     Part:           31-mer      super-mers  ave. length
        0:   19,682,823,520   1,663,529,566         11.8
        1:   19,674,612,623   1,769,086,703         11.1
        2:   19,587,575,233   1,943,187,015         10.1
        3:   19,495,814,725   1,840,956,781         10.6
        4:   19,544,818,789   1,815,438,713         10.8
        5:   19,525,732,179   1,915,986,018         10.2
        6:   19,433,353,106   1,797,203,569         10.8
        7:   19,506,601,185   1,674,358,484         11.7
        8:   19,393,772,868   1,889,521,351         10.3
        9:   19,759,183,095   1,588,962,080         12.4
       10:   19,475,000,660   1,851,361,455         10.5
      Sum:  215,079,287,983  19,749,591,735         10.9

      Range 19,393,772,868 - 19,759,183,095 (1.87%)

  Resources for phase:  68:10.064u  27:05.233s  5:07.933w  1856.0%

Phase 2: Sorting & Counting K-mers in 11 blocks

      Part:    wgt'd k-mers  savings
         0:   2,162,467,906      9.1
         1:   2,056,330,156      9.6
         2:   1,964,553,110     10.0
         3:   2,033,779,444      9.6
         4:   2,531,557,027      7.7
         5:   2,008,534,812      9.7
         6:   1,918,481,344     10.1
         7:   1,965,740,025      9.9
         8:   1,941,849,517     10.0
         9:   2,200,607,424      9.0
        10:   1,895,100,055     10.3
       All:  22,679,000,820      9.5

  Resources for phase:  69:39.686u  11:23.827s  8:13.343w  985.8%

Phase 3 (-t option): Merging K-mer Table Parts

  There are 9,391,432,901 31-mers that occur 1-or-more times

  The table occupies 65.87 GB

  Resources for phase:  7:21.941u  2:38.527s  1:55.494w  519.9%

Total Resources:  145:11.693u  41:07.587s  15:16.770w  1219.4%  107MB

## MERQURY.FK

 Single diploid assembly, no trio data

 Kmer size is 31

 Solid k-mer cutoff is 22

 Making CN-spectra plots for ./HiFi.V1

 Making .qv and .bed files for assembly ./HiFi.V1
/var/spool/slurm/d/job3906010/slurm_script: line 4: 33521 Segmentation fault      ~/software/MERQURY.FK/bin/MerquryFK -v -T64 -pdf WGS.R1.clean.ktab ./HiFi.V1.fa test