maize-genetics / phg_v2

PHG version 2
https://phg.maizegenetics.net/
Apache License 2.0
19 stars 2 forks source link

[BUG]: Compatilibilty issue between the pipeline #220

Closed gsc74 closed 3 weeks ago

gsc74 commented 3 weeks ago

Description

@tcasstevens , I'm running phg v2 with following commands

./phg initdb \
    --db-path db \
    --gvcf-anchor-gap 1000000 \
    --hvcf-anchor-gap 1000

./phg prepare-assemblies \
  --keyfile MHC.keyfile \
   --threads 32 \
   --output-dir output

./phg create-ranges \
    --reference-file MHC-CHM13.0.fa \
    --gff MHC_chm13_adjusted.gff3 \
    --boundary gene \
    --pad 500 \
    --range-min-size 500 \
    -o output/ranges.bed

 ./phg align-assemblies \
    --gff MHC_chm13_adjusted.gff3 \
    --reference-file MHC-CHM13.0.fa \
    --assembly-file-list MHC.keyfile \
    -o output

MHC.keyfile looks like

MHC-CHM13.0.fa  Ref
MHC-HG002.1.fa  Hap_1
MHC-HG002.2.fa  Hap_2

With the MHC.keyfile the prepare-assemblies runs well but align-assemblies fail to run.

Please check the attached log

[main] WARN net.maizegenetics.phgv2.cli.Initdb 2024-09-12 13:33:57,470: Folder db does not exist - creating.
begin Command to create hvcf dataset:conda run -n phgv2-conda tiledbvcf create --uri db/hvcf_dataset -n --log-level debug --log-file db/temp/tiledbvcf_createHvcf.log --anchor-gap 1000
[main] INFO net.maizegenetics.phgv2.cli.Initdb 2024-09-12 13:33:57,473: begin Command to create hvcf dataset:conda run -n phgv2-conda tiledbvcf create --uri db/hvcf_dataset -n --log-level debug --log-file db/temp/tiledbvcf_createHvcf.log --anchor-gap 1000
begin Command to create gvcf dataset:conda run -n phgv2-conda tiledbvcf create --uri db/gvcf_dataset -n --log-level debug --log-file db/temp/tiledbvcf_createHvcf.log --anchor-gap 1000000
[main] INFO net.maizegenetics.phgv2.cli.Initdb 2024-09-12 13:33:58,796: begin Command to create gvcf:conda run -n phgv2-conda tiledbvcf create --uri db/gvcf_dataset -n --log-level debug --log-file db/temp/tiledbvcf_createHvcf.log --anchor-gap 1000000
[main] INFO net.maizegenetics.phgv2.cli.PrepareAssemblies 2024-09-12 13:34:00,752: creating assembliesList, calling createParallelAnnotatedFastas
[main] INFO net.maizegenetics.phgv2.cli.PrepareAssemblies 2024-09-12 13:34:00,782: Adding entries to the inputChannel:
[main] INFO net.maizegenetics.phgv2.cli.PrepareAssemblies 2024-09-12 13:34:00,783: adding Ref to the inputChannel
[main] INFO net.maizegenetics.phgv2.cli.PrepareAssemblies 2024-09-12 13:34:00,783: adding Hap_1 to the inputChannel
[main] INFO net.maizegenetics.phgv2.cli.PrepareAssemblies 2024-09-12 13:34:00,783: adding Hap_2 to the inputChannel
[main] INFO net.maizegenetics.phgv2.cli.PrepareAssemblies 2024-09-12 13:34:00,783: Done adding data to the inputChannel
annotateFasta: entry = Hap_1
annotateFasta: entry = Hap_2
annotateFasta: entry = Ref
[main] INFO net.maizegenetics.phgv2.cli.AlignAssemblies 2024-09-12 13:34:03,166: getSystemMemory: Total system memory: 201116119040 Bytes, 201.11611904 GB, 187.0379907072 GiB
[main] INFO net.maizegenetics.phgv2.cli.AlignAssemblies 2024-09-12 13:34:03,169: calculateNumThreadsAndRuns: systemMemory: 187.0379907072, processors: 78
[main] INFO net.maizegenetics.phgv2.cli.AlignAssemblies 2024-09-12 13:34:03,169: calculateNumThreadsAndRuns: totalThreadsToUse: 78
[main] INFO net.maizegenetics.phgv2.cli.AlignAssemblies 2024-09-12 13:34:03,169: calculateNumThreadsAndRuns: max concurrent threads: 8
[main] INFO net.maizegenetics.phgv2.cli.AlignAssemblies 2024-09-12 13:34:03,171: maximizeRunsAndThreads: totalConcurrentThreads: 8, totalAssemblies: 3
[main] INFO net.maizegenetics.phgv2.cli.AlignAssemblies 2024-09-12 13:34:03,171: maximizeRunsAndThreads: potential run/thread combinations:
[main] INFO net.maizegenetics.phgv2.cli.AlignAssemblies 2024-09-12 13:34:03,171: numAlignments  threadsPerAlignments
[main] INFO net.maizegenetics.phgv2.cli.AlignAssemblies 2024-09-12 13:34:03,171: 2  4
[main] INFO net.maizegenetics.phgv2.cli.AlignAssemblies 2024-09-12 13:34:03,171: 1  8
[main] INFO net.maizegenetics.phgv2.cli.AlignAssemblies 2024-09-12 13:34:03,171: Running 2 runs with 4 threads per runs
[main] INFO net.maizegenetics.phgv2.cli.AlignAssemblies 2024-09-12 13:34:03,172: calculatedNumThreadsAndRuns: returning runsAndThreads values: (2, 4)
[main] INFO net.maizegenetics.phgv2.cli.AlignAssemblies 2024-09-12 13:34:03,172: createCDSfromRefData command:conda run -n phgv2-conda anchorwave gff2seq -r MHC-CHM13.0.fa -i MHC_chm13_adjusted.gff3 -o output/ref.cds.fasta
[main] INFO net.maizegenetics.phgv2.cli.AlignAssemblies 2024-09-12 13:34:06,230: Ref minimap Command: conda run -n phgv2-conda minimap2 -x splice -t 4 -k 12 -a -p 0.4 -N20 MHC-CHM13.0.fa output/ref.cds.fasta -o output/MHC-CHM13.0.sam
[main] INFO net.maizegenetics.phgv2.cli.AlignAssemblies 2024-09-12 13:34:07,864: Adding entries to the inputChannel:
[main] INFO net.maizegenetics.phgv2.cli.AlignAssemblies 2024-09-12 13:34:07,865: Adding: MHC-CHM13.0.fa Ref for processing
[main] INFO net.maizegenetics.phgv2.cli.AlignAssemblies 2024-09-12 13:34:07,870: Adding: MHC-HG002.1.fa Hap_1 for processing
[main] INFO net.maizegenetics.phgv2.cli.AlignAssemblies 2024-09-12 13:34:07,870: Adding: MHC-HG002.2.fa Hap_2 for processing
[DefaultDispatcher-worker-1] INFO net.maizegenetics.phgv2.cli.AlignAssemblies 2024-09-12 13:34:07,872: alignAssembly: asmFileFull: MHC-HG002.1.fa   Hap_1, outputFile: output/MHC-HG002.1.sam , threadsPerRun: 4
[DefaultDispatcher-worker-2] INFO net.maizegenetics.phgv2.cli.AlignAssemblies 2024-09-12 13:34:07,872: alignAssembly: asmFileFull: MHC-CHM13.0.fa   Ref, outputFile: output/MHC-CHM13.0.sam , threadsPerRun: 4
[DefaultDispatcher-worker-1] INFO net.maizegenetics.phgv2.cli.AlignAssemblies 2024-09-12 13:34:07,873: redirectError: output/minimap2_MHC-HG002.1_error.log
[DefaultDispatcher-worker-2] INFO net.maizegenetics.phgv2.cli.AlignAssemblies 2024-09-12 13:34:07,873: redirectError: output/minimap2_MHC-CHM13.0_error.log
[DefaultDispatcher-worker-2] INFO net.maizegenetics.phgv2.cli.AlignAssemblies 2024-09-12 13:34:07,873:  begin minimap assembly Command: conda run -n phgv2-conda minimap2 -x splice -t 4 -k 12 -a -p 0.4 -N20 MHC-CHM13.0.fa    Ref output/ref.cds.fasta -o output/MHC-CHM13.0.sam
[DefaultDispatcher-worker-1] INFO net.maizegenetics.phgv2.cli.AlignAssemblies 2024-09-12 13:34:07,873:  begin minimap assembly Command: conda run -n phgv2-conda minimap2 -x splice -t 4 -k 12 -a -p 0.4 -N20 MHC-HG002.1.fa    Hap_1 output/ref.cds.fasta -o output/MHC-HG002.1.sam
[DefaultDispatcher-worker-1] ERROR net.maizegenetics.phgv2.cli.AlignAssemblies 2024-09-12 13:34:08,971: minimap2 for assembly MHC-HG002.1.fa    Hap_1 run via ProcessBuilder returned error code 1
[DefaultDispatcher-worker-2] ERROR net.maizegenetics.phgv2.cli.AlignAssemblies 2024-09-12 13:34:08,971: minimap2 for assembly MHC-CHM13.0.fa    Ref run via ProcessBuilder returned error code 1
Exception in thread "main" java.lang.IllegalStateException: alignAssembly: error running minimap2 for MHC-HG002.1: 1
    at net.maizegenetics.phgv2.cli.AlignAssemblies$alignAssembly$2.invokeSuspend(AlignAssemblies.kt:577)
    at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33)
    at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:104)
    at kotlinx.coroutines.scheduling.CoroutineScheduler.runSafely(CoroutineScheduler.kt:584)
    at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.executeTask(CoroutineScheduler.kt:811)
    at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.runWorker(CoroutineScheduler.kt:715)
    at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.run(CoroutineScheduler.kt:702)
    Suppressed: java.lang.IllegalStateException: alignAssembly: error running minimap2 for MHC-CHM13.0: 1
        ... 7 more

Expected behavior

No response

PHG version

2.4.7.161

lynnjo commented 3 weeks ago

@gsc74 Please go to your output folder and look for the minimap2 logs. This will provide more information on what is wrong. Was it able to find the file to align? You may need full path name in your keyfile.

gsc74 commented 3 weeks ago

[ERROR] failed to open file 'MHC-HG002.1.fa Hap_1': No such file or directory

ERROR conda.cli.main_run:execute(125): conda run minimap2 -x splice -t 4 -k 12 -a -p 0.4 -N20 MHC-HG002.1.fa Hap_1 output/ref.cds.fasta -o output/MHC-HG002.1.sam failed. (See above for error)

gsc74 commented 3 weeks ago

I tried provinding full path as well, the error is same

[ERROR] failed to open file '/home/ghanshyam/test/phg/bin/MHC-HG002.1.fa Hap_1': No such file or directory

ERROR conda.cli.main_run:execute(125): conda run minimap2 -x splice -t 4 -k 12 -a -p 0.4 -N20 /home/ghanshyam/test/phg/bin/MHC-HG002.1.fa Hap_1 output/ref.cds.fasta -o output/MHC-HG002.1.sam failed. (See above for error)

gsc74 commented 3 weeks ago

I believe the issue lies with the "MHC.keyfile." I noticed that the reference name is given as '/home/ghanshyam/test/phg/bin/MHC-HG002.1.fa Hap_1', but only '/home/ghanshyam/test/phg/bin/MHC-HG002.1.fa' exists as a file. It seems the tool is reading the entire line as a file path during alignment instead of just reading column 1 from the MHC.keyfile.

lynnjo commented 3 weeks ago

It looks like minimap2 is taking both columns together as the file name. Verify you have a tab vs a space between your columns in the keyfile.

gsc74 commented 3 weeks ago

The file has "tab"

gsc74 commented 3 weeks ago

What is a proper format for MHC.keyfile

gsc74 commented 3 weeks ago

When i use "MHC.keyfile" as

/home/ghanshyam/test/phg/bin/MHC-CHM13.0.fa
/home/ghanshyam/test/phg/bin/MHC-HG002.1.fa
/home/ghanshyam/test/phg/bin/MHC-HG002.2.fa

The prepare-assemblies fails to run, please see the attached log

[main] WARN net.maizegenetics.phgv2.cli.Initdb 2024-09-12 18:23:27,776: TileDB datasets already exist in folder db.
If db/gvcf_dataset or db/hvcf_dataset are not tiledb datasets, then delete and run again or chose a different base folder to house your tiledb data.
[main] INFO net.maizegenetics.phgv2.cli.PrepareAssemblies 2024-09-12 18:23:28,392: creating assembliesList, calling createParallelAnnotatedFastas
Exception in thread "main" java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
    at java.base/java.util.Collections$SingletonList.get(Collections.java:4959)
    at net.maizegenetics.phgv2.cli.PrepareAssemblies.run(PrepareAssemblies.kt:63)
    at com.github.ajalt.clikt.parsers.Parser.parse(Parser.kt:306)
    at com.github.ajalt.clikt.parsers.Parser.parse(Parser.kt:319)
    at com.github.ajalt.clikt.parsers.Parser.parse(Parser.kt:40)
    at com.github.ajalt.clikt.core.CliktCommand.parse(CliktCommand.kt:458)
    at com.github.ajalt.clikt.core.CliktCommand.parse$default(CliktCommand.kt:455)
    at com.github.ajalt.clikt.core.CliktCommand.main(CliktCommand.kt:475)
    at com.github.ajalt.clikt.core.CliktCommand.main(CliktCommand.kt:482)
    at net.maizegenetics.phgv2.cli.PhgKt.main(Phg.kt:38)
[main] INFO net.maizegenetics.phgv2.cli.AlignAssemblies 2024-09-12 18:23:30,473: getSystemMemory: Total system memory: 201116119040 Bytes, 201.11611904 GB, 187.0379907072 GiB
[main] INFO net.maizegenetics.phgv2.cli.AlignAssemblies 2024-09-12 18:23:30,477: calculateNumThreadsAndRuns: systemMemory: 187.0379907072, processors: 78
[main] INFO net.maizegenetics.phgv2.cli.AlignAssemblies 2024-09-12 18:23:30,477: calculateNumThreadsAndRuns: totalThreadsToUse: 78
[main] INFO net.maizegenetics.phgv2.cli.AlignAssemblies 2024-09-12 18:23:30,477: calculateNumThreadsAndRuns: max concurrent threads: 8
[main] INFO net.maizegenetics.phgv2.cli.AlignAssemblies 2024-09-12 18:23:30,478: maximizeRunsAndThreads: totalConcurrentThreads: 8, totalAssemblies: 3
[main] INFO net.maizegenetics.phgv2.cli.AlignAssemblies 2024-09-12 18:23:30,478: maximizeRunsAndThreads: potential run/thread combinations:
[main] INFO net.maizegenetics.phgv2.cli.AlignAssemblies 2024-09-12 18:23:30,479: numAlignments  threadsPerAlignments
[main] INFO net.maizegenetics.phgv2.cli.AlignAssemblies 2024-09-12 18:23:30,479: 2  4
[main] INFO net.maizegenetics.phgv2.cli.AlignAssemblies 2024-09-12 18:23:30,479: 1  8
[main] INFO net.maizegenetics.phgv2.cli.AlignAssemblies 2024-09-12 18:23:30,479: Running 2 runs with 4 threads per runs
[main] INFO net.maizegenetics.phgv2.cli.AlignAssemblies 2024-09-12 18:23:30,479: calculatedNumThreadsAndRuns: returning runsAndThreads values: (2, 4)
[main] INFO net.maizegenetics.phgv2.cli.AlignAssemblies 2024-09-12 18:23:30,479: createCDSfromRefData command:conda run -n phgv2-conda anchorwave gff2seq -r MHC-CHM13.0.fa -i MHC_chm13_adjusted.gff3 -o output/ref.cds.fasta
[main] INFO net.maizegenetics.phgv2.cli.AlignAssemblies 2024-09-12 18:23:33,303: Ref minimap Command: conda run -n phgv2-conda minimap2 -x splice -t 4 -k 12 -a -p 0.4 -N20 MHC-CHM13.0.fa output/ref.cds.fasta -o output/MHC-CHM13.0.sam
[main] INFO net.maizegenetics.phgv2.cli.AlignAssemblies 2024-09-12 18:23:34,821: Adding entries to the inputChannel:
[main] INFO net.maizegenetics.phgv2.cli.AlignAssemblies 2024-09-12 18:23:34,822: Adding: /home/ghanshyam/RECOMB_25/phg/bin/MHC-CHM13.0.fa for processing
[main] INFO net.maizegenetics.phgv2.cli.AlignAssemblies 2024-09-12 18:23:34,828: Adding: /home/ghanshyam/RECOMB_25/phg/bin/MHC-HG002.1.fa for processing
[main] INFO net.maizegenetics.phgv2.cli.AlignAssemblies 2024-09-12 18:23:34,828: Adding: /home/ghanshyam/RECOMB_25/phg/bin/MHC-HG002.2.fa for processing
[DefaultDispatcher-worker-2] INFO net.maizegenetics.phgv2.cli.AlignAssemblies 2024-09-12 18:23:34,830: alignAssembly: asmFileFull: /home/ghanshyam/RECOMB_25/phg/bin/MHC-CHM13.0.fa, outputFile: output/MHC-CHM13.0.sam , threadsPerRun: 4
[DefaultDispatcher-worker-1] INFO net.maizegenetics.phgv2.cli.AlignAssemblies 2024-09-12 18:23:34,830: alignAssembly: asmFileFull: /home/ghanshyam/RECOMB_25/phg/bin/MHC-HG002.1.fa, outputFile: output/MHC-HG002.1.sam , threadsPerRun: 4
[DefaultDispatcher-worker-1] INFO net.maizegenetics.phgv2.cli.AlignAssemblies 2024-09-12 18:23:34,830: redirectError: output/minimap2_MHC-HG002.1_error.log
[DefaultDispatcher-worker-2] INFO net.maizegenetics.phgv2.cli.AlignAssemblies 2024-09-12 18:23:34,830: redirectError: output/minimap2_MHC-CHM13.0_error.log
[DefaultDispatcher-worker-2] INFO net.maizegenetics.phgv2.cli.AlignAssemblies 2024-09-12 18:23:34,830:  begin minimap assembly Command: conda run -n phgv2-conda minimap2 -x splice -t 4 -k 12 -a -p 0.4 -N20 /home/ghanshyam/RECOMB_25/phg/bin/MHC-CHM13.0.fa output/ref.cds.fasta -o output/MHC-CHM13.0.sam
[DefaultDispatcher-worker-1] INFO net.maizegenetics.phgv2.cli.AlignAssemblies 2024-09-12 18:23:34,830:  begin minimap assembly Command: conda run -n phgv2-conda minimap2 -x splice -t 4 -k 12 -a -p 0.4 -N20 /home/ghanshyam/RECOMB_25/phg/bin/MHC-HG002.1.fa output/ref.cds.fasta -o output/MHC-HG002.1.sam
[DefaultDispatcher-worker-2] INFO net.maizegenetics.phgv2.cli.AlignAssemblies 2024-09-12 18:23:36,351: redirectError: output/proali_MHC-CHM13.0_outputAndError.log
[DefaultDispatcher-worker-2] INFO net.maizegenetics.phgv2.cli.AlignAssemblies 2024-09-12 18:23:36,352: runAnchorwaveProali proali Command for MHC-CHM13.0: conda run -n phgv2-conda anchorwave proali -i MHC_chm13_adjusted.gff3 -r MHC-CHM13.0.fa -as output/ref.cds.fasta -a output/MHC-CHM13.0.sam -ar output/MHC-CHM13.0.sam -s /home/ghanshyam/RECOMB_25/phg/bin/MHC-CHM13.0.fa -n output/MHC-CHM13.0_MHC-CHM13.0.anchorspro -R 1 -Q 1 -t 4 -o output/MHC-CHM13.0.maf
[DefaultDispatcher-worker-1] INFO net.maizegenetics.phgv2.cli.AlignAssemblies 2024-09-12 18:23:36,372: redirectError: output/proali_MHC-HG002.1_outputAndError.log
[DefaultDispatcher-worker-1] INFO net.maizegenetics.phgv2.cli.AlignAssemblies 2024-09-12 18:23:36,373: runAnchorwaveProali proali Command for MHC-HG002.1: conda run -n phgv2-conda anchorwave proali -i MHC_chm13_adjusted.gff3 -r MHC-CHM13.0.fa -as output/ref.cds.fasta -a output/MHC-HG002.1.sam -ar output/MHC-CHM13.0.sam -s /home/ghanshyam/RECOMB_25/phg/bin/MHC-HG002.1.fa -n output/MHC-HG002.1_MHC-CHM13.0.anchorspro -R 1 -Q 1 -t 4 -o output/MHC-HG002.1.maf
[DefaultDispatcher-worker-2] INFO net.maizegenetics.phgv2.cli.AlignAssemblies 2024-09-12 18:23:41,960: outputDir for ggsave: output, plotFile=/home/ghanshyam/RECOMB_25/phg/bin/output/MHC-CHM13.0_dotplot.svg
SLF4J: No SLF4J providers were found.
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See https://www.slf4j.org/codes.html#noProviders for further details.
SLF4J: Class path contains SLF4J bindings targeting slf4j-api versions 1.7.x or earlier.
SLF4J: Ignoring binding found at [jar:file:/home/ghanshyam/RECOMB_25/phg/lib/logback-classic-1.2.6.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See https://www.slf4j.org/codes.html#ignoredBindings for an explanation.
[DefaultDispatcher-worker-2] INFO net.maizegenetics.phgv2.cli.AlignAssemblies 2024-09-12 18:23:42,699: Dot plot for MHC-CHM13.0 saved to: /home/ghanshyam/RECOMB_25/phg/bin/output/MHC-CHM13.0_dotplot.svg
[DefaultDispatcher-worker-2] INFO net.maizegenetics.phgv2.cli.AlignAssemblies 2024-09-12 18:23:42,699: alignAssembly: asmFileFull: /home/ghanshyam/RECOMB_25/phg/bin/MHC-HG002.2.fa, outputFile: output/MHC-HG002.2.sam , threadsPerRun: 4
[DefaultDispatcher-worker-2] INFO net.maizegenetics.phgv2.cli.AlignAssemblies 2024-09-12 18:23:42,699: redirectError: output/minimap2_MHC-HG002.2_error.log
[DefaultDispatcher-worker-2] INFO net.maizegenetics.phgv2.cli.AlignAssemblies 2024-09-12 18:23:42,699:  begin minimap assembly Command: conda run -n phgv2-conda minimap2 -x splice -t 4 -k 12 -a -p 0.4 -N20 /home/ghanshyam/RECOMB_25/phg/bin/MHC-HG002.2.fa output/ref.cds.fasta -o output/MHC-HG002.2.sam
[main] INFO net.maizegenetics.phgv2.cli.AlignAssemblies 2024-09-12 18:23:42,700: Done Adding data to the inputChannel:
[DefaultDispatcher-worker-2] INFO net.maizegenetics.phgv2.cli.AlignAssemblies 2024-09-12 18:23:44,256: redirectError: output/proali_MHC-HG002.2_outputAndError.log
[DefaultDispatcher-worker-2] INFO net.maizegenetics.phgv2.cli.AlignAssemblies 2024-09-12 18:23:44,257: runAnchorwaveProali proali Command for MHC-HG002.2: conda run -n phgv2-conda anchorwave proali -i MHC_chm13_adjusted.gff3 -r MHC-CHM13.0.fa -as output/ref.cds.fasta -a output/MHC-HG002.2.sam -ar output/MHC-CHM13.0.sam -s /home/ghanshyam/RECOMB_25/phg/bin/MHC-HG002.2.fa -n output/MHC-HG002.2_MHC-CHM13.0.anchorspro -R 1 -Q 1 -t 4 -o output/MHC-HG002.2.maf
[DefaultDispatcher-worker-1] INFO net.maizegenetics.phgv2.cli.AlignAssemblies 2024-09-12 18:23:55,447: outputDir for ggsave: output, plotFile=/home/ghanshyam/RECOMB_25/phg/bin/output/MHC-HG002.1_dotplot.svg
[DefaultDispatcher-worker-1] INFO net.maizegenetics.phgv2.cli.AlignAssemblies 2024-09-12 18:23:55,475: Dot plot for MHC-HG002.1 saved to: /home/ghanshyam/RECOMB_25/phg/bin/output/MHC-HG002.1_dotplot.svg
^[[D[DefaultDispatcher-worker-2] INFO net.maizegenetics.phgv2.cli.AlignAssemblies 2024-09-12 18:24:04,815: outputDir for ggsave: output, plotFile=/home/ghanshyam/RECOMB_25/phg/bin/output/MHC-HG002.2_dotplot.svg
[DefaultDispatcher-worker-2] INFO net.maizegenetics.phgv2.cli.AlignAssemblies 2024-09-12 18:24:04,839: Dot plot for MHC-HG002.2 saved to: /home/ghanshyam/RECOMB_25/phg/bin/output/MHC-HG002.2_dotplot.svg
lynnjo commented 3 weeks ago

Sorry, I was thinking of read mapping. You are running align-assemblies. The "assembly-file-list" is not a keyfile, it is a file with a list of assemblies, one per line. There should only be 1 column. PrepareAssemblies takes a key file, but align-assemblies takes a file with a single column that is a full path to each assembly that will be aligned, one assemblye per line.

gsc74 commented 3 weeks ago

Sorry, I was thinking of read mapping. You are running align-assemblies. The "assembly-file-list" is not a keyfile, it is a file with a list of assemblies, one per line. There should only be 1 column. PrepareAssemblies takes a key file, but align-assemblies takes a file with a single column that is a full path to each assembly that will be aligned, one assemblye per line.

Can you rewrite the proper commands?

lynnjo commented 3 weeks ago

What documentation are you looking at ? The documentation in the PHGv2 docs section is correct.

gsc74 commented 3 weeks ago

Quick start section of the document: https://github.com/maize-genetics/phg_v2/blob/main/docs/build_and_load.md

lynnjo commented 3 weeks ago

What version of phv2 are you using?

Looking at the current version quick start it shows these commands for prepare-assemblies and align-assemblies:

phg prepare-assemblies \
    --keyfile /path/to/keyfile \
    --output-dir /path/to/updated/fastas \
    --threads 10

  phg align-assemblies \
    --gff anchors.gff \
    --reference-file /my/updated/ref.fasta \
    --assembly-file-list /updated/assemblies_list.txt \
    -o /path/for/generated_files

    These examples are correct.  When running align-assemblies, you would want a list of the fasta files created from prepare-assemblies.   Let us know if you there is a change to this example you think would be clearer.
gsc74 commented 3 weeks ago

Thanks, I identified the issue. I was mistakenly using 'MHC.keyfile' as the --keyfile for 'prepare-assemblies', and also providing it as the --assembly-file-list for 'align-assemblies'. As you pointed out, the --assembly-file-list should only contain a single-column file.