Closed gsc74 closed 3 weeks ago
@gsc74 Please go to your output folder and look for the minimap2 logs. This will provide more information on what is wrong. Was it able to find the file to align? You may need full path name in your keyfile.
[ERROR] failed to open file 'MHC-HG002.1.fa Hap_1': No such file or directory
ERROR conda.cli.main_run:execute(125): conda run minimap2 -x splice -t 4 -k 12 -a -p 0.4 -N20 MHC-HG002.1.fa Hap_1 output/ref.cds.fasta -o output/MHC-HG002.1.sam
failed. (See above for error)
I tried provinding full path as well, the error is same
[ERROR] failed to open file '/home/ghanshyam/test/phg/bin/MHC-HG002.1.fa Hap_1': No such file or directory
ERROR conda.cli.main_run:execute(125): conda run minimap2 -x splice -t 4 -k 12 -a -p 0.4 -N20 /home/ghanshyam/test/phg/bin/MHC-HG002.1.fa Hap_1 output/ref.cds.fasta -o output/MHC-HG002.1.sam
failed. (See above for error)
I believe the issue lies with the "MHC.keyfile." I noticed that the reference name is given as '/home/ghanshyam/test/phg/bin/MHC-HG002.1.fa Hap_1', but only '/home/ghanshyam/test/phg/bin/MHC-HG002.1.fa' exists as a file. It seems the tool is reading the entire line as a file path during alignment instead of just reading column 1 from the MHC.keyfile.
It looks like minimap2 is taking both columns together as the file name. Verify you have a tab vs a space between your columns in the keyfile.
The file has "tab"
What is a proper format for MHC.keyfile
When i use "MHC.keyfile" as
/home/ghanshyam/test/phg/bin/MHC-CHM13.0.fa
/home/ghanshyam/test/phg/bin/MHC-HG002.1.fa
/home/ghanshyam/test/phg/bin/MHC-HG002.2.fa
The prepare-assemblies fails to run, please see the attached log
[main] WARN net.maizegenetics.phgv2.cli.Initdb 2024-09-12 18:23:27,776: TileDB datasets already exist in folder db.
If db/gvcf_dataset or db/hvcf_dataset are not tiledb datasets, then delete and run again or chose a different base folder to house your tiledb data.
[main] INFO net.maizegenetics.phgv2.cli.PrepareAssemblies 2024-09-12 18:23:28,392: creating assembliesList, calling createParallelAnnotatedFastas
Exception in thread "main" java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
at java.base/java.util.Collections$SingletonList.get(Collections.java:4959)
at net.maizegenetics.phgv2.cli.PrepareAssemblies.run(PrepareAssemblies.kt:63)
at com.github.ajalt.clikt.parsers.Parser.parse(Parser.kt:306)
at com.github.ajalt.clikt.parsers.Parser.parse(Parser.kt:319)
at com.github.ajalt.clikt.parsers.Parser.parse(Parser.kt:40)
at com.github.ajalt.clikt.core.CliktCommand.parse(CliktCommand.kt:458)
at com.github.ajalt.clikt.core.CliktCommand.parse$default(CliktCommand.kt:455)
at com.github.ajalt.clikt.core.CliktCommand.main(CliktCommand.kt:475)
at com.github.ajalt.clikt.core.CliktCommand.main(CliktCommand.kt:482)
at net.maizegenetics.phgv2.cli.PhgKt.main(Phg.kt:38)
[main] INFO net.maizegenetics.phgv2.cli.AlignAssemblies 2024-09-12 18:23:30,473: getSystemMemory: Total system memory: 201116119040 Bytes, 201.11611904 GB, 187.0379907072 GiB
[main] INFO net.maizegenetics.phgv2.cli.AlignAssemblies 2024-09-12 18:23:30,477: calculateNumThreadsAndRuns: systemMemory: 187.0379907072, processors: 78
[main] INFO net.maizegenetics.phgv2.cli.AlignAssemblies 2024-09-12 18:23:30,477: calculateNumThreadsAndRuns: totalThreadsToUse: 78
[main] INFO net.maizegenetics.phgv2.cli.AlignAssemblies 2024-09-12 18:23:30,477: calculateNumThreadsAndRuns: max concurrent threads: 8
[main] INFO net.maizegenetics.phgv2.cli.AlignAssemblies 2024-09-12 18:23:30,478: maximizeRunsAndThreads: totalConcurrentThreads: 8, totalAssemblies: 3
[main] INFO net.maizegenetics.phgv2.cli.AlignAssemblies 2024-09-12 18:23:30,478: maximizeRunsAndThreads: potential run/thread combinations:
[main] INFO net.maizegenetics.phgv2.cli.AlignAssemblies 2024-09-12 18:23:30,479: numAlignments threadsPerAlignments
[main] INFO net.maizegenetics.phgv2.cli.AlignAssemblies 2024-09-12 18:23:30,479: 2 4
[main] INFO net.maizegenetics.phgv2.cli.AlignAssemblies 2024-09-12 18:23:30,479: 1 8
[main] INFO net.maizegenetics.phgv2.cli.AlignAssemblies 2024-09-12 18:23:30,479: Running 2 runs with 4 threads per runs
[main] INFO net.maizegenetics.phgv2.cli.AlignAssemblies 2024-09-12 18:23:30,479: calculatedNumThreadsAndRuns: returning runsAndThreads values: (2, 4)
[main] INFO net.maizegenetics.phgv2.cli.AlignAssemblies 2024-09-12 18:23:30,479: createCDSfromRefData command:conda run -n phgv2-conda anchorwave gff2seq -r MHC-CHM13.0.fa -i MHC_chm13_adjusted.gff3 -o output/ref.cds.fasta
[main] INFO net.maizegenetics.phgv2.cli.AlignAssemblies 2024-09-12 18:23:33,303: Ref minimap Command: conda run -n phgv2-conda minimap2 -x splice -t 4 -k 12 -a -p 0.4 -N20 MHC-CHM13.0.fa output/ref.cds.fasta -o output/MHC-CHM13.0.sam
[main] INFO net.maizegenetics.phgv2.cli.AlignAssemblies 2024-09-12 18:23:34,821: Adding entries to the inputChannel:
[main] INFO net.maizegenetics.phgv2.cli.AlignAssemblies 2024-09-12 18:23:34,822: Adding: /home/ghanshyam/RECOMB_25/phg/bin/MHC-CHM13.0.fa for processing
[main] INFO net.maizegenetics.phgv2.cli.AlignAssemblies 2024-09-12 18:23:34,828: Adding: /home/ghanshyam/RECOMB_25/phg/bin/MHC-HG002.1.fa for processing
[main] INFO net.maizegenetics.phgv2.cli.AlignAssemblies 2024-09-12 18:23:34,828: Adding: /home/ghanshyam/RECOMB_25/phg/bin/MHC-HG002.2.fa for processing
[DefaultDispatcher-worker-2] INFO net.maizegenetics.phgv2.cli.AlignAssemblies 2024-09-12 18:23:34,830: alignAssembly: asmFileFull: /home/ghanshyam/RECOMB_25/phg/bin/MHC-CHM13.0.fa, outputFile: output/MHC-CHM13.0.sam , threadsPerRun: 4
[DefaultDispatcher-worker-1] INFO net.maizegenetics.phgv2.cli.AlignAssemblies 2024-09-12 18:23:34,830: alignAssembly: asmFileFull: /home/ghanshyam/RECOMB_25/phg/bin/MHC-HG002.1.fa, outputFile: output/MHC-HG002.1.sam , threadsPerRun: 4
[DefaultDispatcher-worker-1] INFO net.maizegenetics.phgv2.cli.AlignAssemblies 2024-09-12 18:23:34,830: redirectError: output/minimap2_MHC-HG002.1_error.log
[DefaultDispatcher-worker-2] INFO net.maizegenetics.phgv2.cli.AlignAssemblies 2024-09-12 18:23:34,830: redirectError: output/minimap2_MHC-CHM13.0_error.log
[DefaultDispatcher-worker-2] INFO net.maizegenetics.phgv2.cli.AlignAssemblies 2024-09-12 18:23:34,830: begin minimap assembly Command: conda run -n phgv2-conda minimap2 -x splice -t 4 -k 12 -a -p 0.4 -N20 /home/ghanshyam/RECOMB_25/phg/bin/MHC-CHM13.0.fa output/ref.cds.fasta -o output/MHC-CHM13.0.sam
[DefaultDispatcher-worker-1] INFO net.maizegenetics.phgv2.cli.AlignAssemblies 2024-09-12 18:23:34,830: begin minimap assembly Command: conda run -n phgv2-conda minimap2 -x splice -t 4 -k 12 -a -p 0.4 -N20 /home/ghanshyam/RECOMB_25/phg/bin/MHC-HG002.1.fa output/ref.cds.fasta -o output/MHC-HG002.1.sam
[DefaultDispatcher-worker-2] INFO net.maizegenetics.phgv2.cli.AlignAssemblies 2024-09-12 18:23:36,351: redirectError: output/proali_MHC-CHM13.0_outputAndError.log
[DefaultDispatcher-worker-2] INFO net.maizegenetics.phgv2.cli.AlignAssemblies 2024-09-12 18:23:36,352: runAnchorwaveProali proali Command for MHC-CHM13.0: conda run -n phgv2-conda anchorwave proali -i MHC_chm13_adjusted.gff3 -r MHC-CHM13.0.fa -as output/ref.cds.fasta -a output/MHC-CHM13.0.sam -ar output/MHC-CHM13.0.sam -s /home/ghanshyam/RECOMB_25/phg/bin/MHC-CHM13.0.fa -n output/MHC-CHM13.0_MHC-CHM13.0.anchorspro -R 1 -Q 1 -t 4 -o output/MHC-CHM13.0.maf
[DefaultDispatcher-worker-1] INFO net.maizegenetics.phgv2.cli.AlignAssemblies 2024-09-12 18:23:36,372: redirectError: output/proali_MHC-HG002.1_outputAndError.log
[DefaultDispatcher-worker-1] INFO net.maizegenetics.phgv2.cli.AlignAssemblies 2024-09-12 18:23:36,373: runAnchorwaveProali proali Command for MHC-HG002.1: conda run -n phgv2-conda anchorwave proali -i MHC_chm13_adjusted.gff3 -r MHC-CHM13.0.fa -as output/ref.cds.fasta -a output/MHC-HG002.1.sam -ar output/MHC-CHM13.0.sam -s /home/ghanshyam/RECOMB_25/phg/bin/MHC-HG002.1.fa -n output/MHC-HG002.1_MHC-CHM13.0.anchorspro -R 1 -Q 1 -t 4 -o output/MHC-HG002.1.maf
[DefaultDispatcher-worker-2] INFO net.maizegenetics.phgv2.cli.AlignAssemblies 2024-09-12 18:23:41,960: outputDir for ggsave: output, plotFile=/home/ghanshyam/RECOMB_25/phg/bin/output/MHC-CHM13.0_dotplot.svg
SLF4J: No SLF4J providers were found.
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See https://www.slf4j.org/codes.html#noProviders for further details.
SLF4J: Class path contains SLF4J bindings targeting slf4j-api versions 1.7.x or earlier.
SLF4J: Ignoring binding found at [jar:file:/home/ghanshyam/RECOMB_25/phg/lib/logback-classic-1.2.6.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See https://www.slf4j.org/codes.html#ignoredBindings for an explanation.
[DefaultDispatcher-worker-2] INFO net.maizegenetics.phgv2.cli.AlignAssemblies 2024-09-12 18:23:42,699: Dot plot for MHC-CHM13.0 saved to: /home/ghanshyam/RECOMB_25/phg/bin/output/MHC-CHM13.0_dotplot.svg
[DefaultDispatcher-worker-2] INFO net.maizegenetics.phgv2.cli.AlignAssemblies 2024-09-12 18:23:42,699: alignAssembly: asmFileFull: /home/ghanshyam/RECOMB_25/phg/bin/MHC-HG002.2.fa, outputFile: output/MHC-HG002.2.sam , threadsPerRun: 4
[DefaultDispatcher-worker-2] INFO net.maizegenetics.phgv2.cli.AlignAssemblies 2024-09-12 18:23:42,699: redirectError: output/minimap2_MHC-HG002.2_error.log
[DefaultDispatcher-worker-2] INFO net.maizegenetics.phgv2.cli.AlignAssemblies 2024-09-12 18:23:42,699: begin minimap assembly Command: conda run -n phgv2-conda minimap2 -x splice -t 4 -k 12 -a -p 0.4 -N20 /home/ghanshyam/RECOMB_25/phg/bin/MHC-HG002.2.fa output/ref.cds.fasta -o output/MHC-HG002.2.sam
[main] INFO net.maizegenetics.phgv2.cli.AlignAssemblies 2024-09-12 18:23:42,700: Done Adding data to the inputChannel:
[DefaultDispatcher-worker-2] INFO net.maizegenetics.phgv2.cli.AlignAssemblies 2024-09-12 18:23:44,256: redirectError: output/proali_MHC-HG002.2_outputAndError.log
[DefaultDispatcher-worker-2] INFO net.maizegenetics.phgv2.cli.AlignAssemblies 2024-09-12 18:23:44,257: runAnchorwaveProali proali Command for MHC-HG002.2: conda run -n phgv2-conda anchorwave proali -i MHC_chm13_adjusted.gff3 -r MHC-CHM13.0.fa -as output/ref.cds.fasta -a output/MHC-HG002.2.sam -ar output/MHC-CHM13.0.sam -s /home/ghanshyam/RECOMB_25/phg/bin/MHC-HG002.2.fa -n output/MHC-HG002.2_MHC-CHM13.0.anchorspro -R 1 -Q 1 -t 4 -o output/MHC-HG002.2.maf
[DefaultDispatcher-worker-1] INFO net.maizegenetics.phgv2.cli.AlignAssemblies 2024-09-12 18:23:55,447: outputDir for ggsave: output, plotFile=/home/ghanshyam/RECOMB_25/phg/bin/output/MHC-HG002.1_dotplot.svg
[DefaultDispatcher-worker-1] INFO net.maizegenetics.phgv2.cli.AlignAssemblies 2024-09-12 18:23:55,475: Dot plot for MHC-HG002.1 saved to: /home/ghanshyam/RECOMB_25/phg/bin/output/MHC-HG002.1_dotplot.svg
^[[D[DefaultDispatcher-worker-2] INFO net.maizegenetics.phgv2.cli.AlignAssemblies 2024-09-12 18:24:04,815: outputDir for ggsave: output, plotFile=/home/ghanshyam/RECOMB_25/phg/bin/output/MHC-HG002.2_dotplot.svg
[DefaultDispatcher-worker-2] INFO net.maizegenetics.phgv2.cli.AlignAssemblies 2024-09-12 18:24:04,839: Dot plot for MHC-HG002.2 saved to: /home/ghanshyam/RECOMB_25/phg/bin/output/MHC-HG002.2_dotplot.svg
Sorry, I was thinking of read mapping. You are running align-assemblies. The "assembly-file-list" is not a keyfile, it is a file with a list of assemblies, one per line. There should only be 1 column. PrepareAssemblies takes a key file, but align-assemblies takes a file with a single column that is a full path to each assembly that will be aligned, one assemblye per line.
Sorry, I was thinking of read mapping. You are running align-assemblies. The "assembly-file-list" is not a keyfile, it is a file with a list of assemblies, one per line. There should only be 1 column. PrepareAssemblies takes a key file, but align-assemblies takes a file with a single column that is a full path to each assembly that will be aligned, one assemblye per line.
Can you rewrite the proper commands?
What documentation are you looking at ? The documentation in the PHGv2 docs section is correct.
Quick start section of the document: https://github.com/maize-genetics/phg_v2/blob/main/docs/build_and_load.md
What version of phv2 are you using?
Looking at the current version quick start it shows these commands for prepare-assemblies and align-assemblies:
phg prepare-assemblies \
--keyfile /path/to/keyfile \
--output-dir /path/to/updated/fastas \
--threads 10
phg align-assemblies \
--gff anchors.gff \
--reference-file /my/updated/ref.fasta \
--assembly-file-list /updated/assemblies_list.txt \
-o /path/for/generated_files
These examples are correct. When running align-assemblies, you would want a list of the fasta files created from prepare-assemblies. Let us know if you there is a change to this example you think would be clearer.
Thanks, I identified the issue. I was mistakenly using 'MHC.keyfile' as the --keyfile for 'prepare-assemblies', and also providing it as the --assembly-file-list for 'align-assemblies'. As you pointed out, the --assembly-file-list should only contain a single-column file.
Description
@tcasstevens , I'm running phg v2 with following commands
MHC.keyfile
looks likeWith the
MHC.keyfile
theprepare-assemblies
runs well butalign-assemblies
fail to run.Please check the attached log
Expected behavior
No response
PHG version
2.4.7.161