pangenome / smoothxg

linearize and simplify variation graphs using blocked partial order alignment
Other
56 stars 7 forks source link

Error during run of smoothxg #2

Closed brettChapman closed 1 month ago

brettChapman commented 4 years ago

Hi Erik

I tried running smoothxg on my GFA file (generated from Edyeet and Seqwish) and received the following error.

srun -n 1 singularity exec --bind /data/pangenome_paf2vg/edyeet_results:/data/pangenome_paf2vg/edyeet_results /data/smoothxg_builds/smoothxg.sif smoothxg -t 16 -g /data/pangenome_paf2vg/edyeet_results/Morex_v1_5H_vs_Morex_v2_5H.gfa -V
topological sort 89803306 of 89803306 ~ 100.0000%
[path sgd sort]: 3.33% progress: iteration: 1, eta: 448940415309625600.00, delta_max: 594444905.25, number of updates: 1269049259
[path sgd sort]: 6.67% progress: iteration: 2, eta: 94303329111190512.00, delta_max: 368510246.58, number of updates: 1269049348
[path sgd sort]: 10.00% progress: iteration: 3, eta: 19809127399056060.00, delta_max: 381083138.07, number of updates: 1269049302
[path sgd sort]: 13.33% progress: iteration: 4, eta: 4161057006262880.50, delta_max: 383619328.38, number of updates: 1269049567
[path sgd sort]: 16.67% progress: iteration: 5, eta: 874061489967219.12, delta_max: 367479371.61, number of updates: 1269050798
[path sgd sort]: 20.00% progress: iteration: 6, eta: 183603225597205.25, delta_max: 374963091.71, number of updates: 1269049436
[path sgd sort]: 23.33% progress: iteration: 7, eta: 38567245939370.37, delta_max: 367984332.28, number of updates: 1269049396
[path sgd sort]: 26.67% progress: iteration: 8, eta: 8101341654046.21, delta_max: 388226502.63, number of updates: 1269049309
[path sgd sort]: 30.00% progress: iteration: 9, eta: 1701748076561.14, delta_max: 386113946.32, number of updates: 1269049264
[path sgd sort]: 33.33% progress: iteration: 10, eta: 357465052055.07, delta_max: 434327299.29, number of updates: 1269049329
[path sgd sort]: 36.67% progress: iteration: 11, eta: 75088237325.32, delta_max: 382051011.29, number of updates: 1269049158
[path sgd sort]: 40.00% progress: iteration: 12, eta: 15772852065.42, delta_max: 385033556.35, number of updates: 1269049442
[path sgd sort]: 43.33% progress: iteration: 13, eta: 3313206850.23, delta_max: 388481862.24, number of updates: 1269049230
[path sgd sort]: 46.67% progress: iteration: 14, eta: 695964153.27, delta_max: 378469030.30, number of updates: 1269049235
[path sgd sort]: 50.00% progress: iteration: 15, eta: 146192533.26, delta_max: 271011994.39, number of updates: 1269049341
[path sgd sort]: 53.33% progress: iteration: 16, eta: 30708847.11, delta_max: 224731454.76, number of updates: 1269049320
[path sgd sort]: 56.67% progress: iteration: 17, eta: 6450625.56, delta_max: 159636655.50, number of updates: 1269054719
[path sgd sort]: 60.00% progress: iteration: 18, eta: 1355002.68, delta_max: 122656738.47, number of updates: 1269049239
[path sgd sort]: 63.33% progress: iteration: 19, eta: 284628.56, delta_max: 106244339.66, number of updates: 1269049526
[path sgd sort]: 66.67% progress: iteration: 20, eta: 59788.38, delta_max: 92229319.32, number of updates: 1269049249
[path sgd sort]: 70.00% progress: iteration: 21, eta: 12559.00, delta_max: 78504943.86, number of updates: 1269049288
[path sgd sort]: 73.33% progress: iteration: 22, eta: 2638.11, delta_max: 64098323.67, number of updates: 1269049403
[path sgd sort]: 76.67% progress: iteration: 23, eta: 554.16, delta_max: 44395128.51, number of updates: 1269049426
[path sgd sort]: 80.00% progress: iteration: 24, eta: 116.40, delta_max: 24999736.03, number of updates: 1269049253
[path sgd sort]: 83.33% progress: iteration: 25, eta: 24.45, delta_max: 12598686.25, number of updates: 1269049170
[path sgd sort]: 86.67% progress: iteration: 26, eta: 5.14, delta_max: 11266547.88, number of updates: 1269049562
[path sgd sort]: 90.00% progress: iteration: 27, eta: 1.08, delta_max: 9630000.63, number of updates: 1269049232
[path sgd sort]: 93.33% progress: iteration: 28, eta: 0.23, delta_max: 897421.29, number of updates: 1269049308
[path sgd sort]: 96.67% progress: iteration: 29, eta: 0.05, delta_max: 152573.99, number of updates: 1269049455
[path sgd sort]: 100.00% progress: iteration: 30, eta: 0.01, delta_max: 72168.32, number of updates: 1269049556
topological sort 89803306 of 89803306 ~ 100.0000%
mismatch in handle sequence for 164047
srun: error: node-6: task 0: Exited with exit code 1

I built smoothxg in Docker and then from the local Docker image to Singularity. Installation looked ok. My Docker file is here:

FROM ubuntu:18.04

RUN echo 'debconf debconf/frontend select Noninteractive' | debconf-set-selections

RUN apt-get update && apt-get upgrade -y && apt-get install -y \
  apt-utils \
  dialog \
  build-essential \
  git \
  libssl-dev \
  libffi-dev \
  python3-dev \
  cmake \
  && rm -rf /var/lib/apt/lists/*

RUN git clone --recursive https://github.com/ekg/smoothxg.git

RUN cd smoothxg && cmake -H. -Bbuild && cmake --build build -- -j4

ENV PATH="/smoothxg/bin:${PATH}"

I'm trying to align two different versions of Morex 5H chromosome for comparison, to build a genome graph and deconstruct to a VCF file. I've tried other tools for alignment such as GSAlign, and I can see differences, so I know the genomes on 5H are quite different, mostly SNPs though. I thought this would be a good test ground for Edyeet and smoothxg. I did get a file from the run, suffixed with .prep.gfa. I'm not sure if that's an intermediate file or the output. I tried piping all output to a .smooth.gfa file, which is still empty. The .prep.gfa file is several times larger than my original .gfa file. Does that sound correct to you?

I ran the command like so:

srun -n 1 singularity exec --bind ${PWD}:${PWD} ${SMOOTHXG_IMAGE} smoothxg -t ${SLURM_NTASKS_PER_NODE} -g ${INPUTGFA} -V > ${OUTPUTGFA}

Thanks.

ekg commented 4 years ago

It looks like you've run out of memory. This is on the most-recent git HEAD?

I just resolved a problem where the memory usage of the spoa algorithm rose over time. I think it had to do with fragmentation of the allocator that it uses.

There is another way to further reduce the memory consumption by writing the POA graphs out to disk, and then reading them back in when the final graph is built. This could help a fair bit, depending on how big the graph is.

brettChapman commented 4 years ago

Ok thanks. I think it's using the most recent version. I pulled from Git only a few days ago (27th August) to create the Docker image. Any way to reduce memory consumption would be great!

If it's only just been updated I'll create another image and try again. Thanks.

ekg commented 4 years ago

The relevant commits are from August 29. Please rebuild your docker image from the current head before testing.

On Wed, Sep 2, 2020, 03:53 Brett Chapman notifications@github.com wrote:

Ok thanks. I think it's using the most recent version. I pulled from Git only a few days ago (27th August) to create the Docker image. Any way to reduce memory consumption would be great!

If it's only just been updated I'll create another image and try again. Thanks.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ekg/smoothxg/issues/2#issuecomment-685234425, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABDQEORNTHDICOI56PSZBLSDWQPZANCNFSM4QQAEY4A .