maize-genetics / phg_v2

Practical Haplotype Graph (PHG) version 2
https://phg.maizegenetics.net/
Apache License 2.0
21 stars 2 forks source link

[BUG]: Process Builder Error 132 in AGC compress / Illegal instruction (core dumped) #244

Closed pmvijar closed 2 weeks ago

pmvijar commented 3 weeks ago

Description

Building and Loading Phase AGC compress error

Hello, I am trying out the software in Rice Data. I experienced an error in AGC compress. For reference I am using IR 64 and Azucena genomes with IRGSP as my reference from NCBI.

Error from the terminal

[main] INFO net.maizegenetics.phgv2.cli.AgcCompress 2024-10-21 04:28:13,074: Starting AGC compression: validate the URI [main] INFO net.maizegenetics.phgv2.utils.VariantLoadingUtils 2024-10-21 04:28:13,085: begin Command:conda run -n phgv2-conda tiledbvcf stat --uri vcf_dbs/hvcf_dataset [main] INFO net.maizegenetics.phgv2.utils.VariantLoadingUtils 2024-10-21 04:28:14,468: Using TileDB datasets created in folder vcf_dbs. [main] INFO net.maizegenetics.phgv2.cli.AgcCompress 2024-10-21 04:28:14,474: Verifying FASTA files id lines are annotated [main] INFO net.maizegenetics.phgv2.cli.AgcCompress 2024-10-21 04:28:14,478: VerifyFileAnnotation: time: 0.00253612 secs. [main] INFO net.maizegenetics.phgv2.cli.AgcCompress 2024-10-21 04:28:14,479: calling loadAGCFiles [main] INFO net.maizegenetics.phgv2.cli.AgcCompress 2024-10-21 04:28:14,479: begin Command to create/append:conda run -n phgv2-conda agc create -i data/assemblies_list.txt -o vcf_dbs/assemblies.agc output/updated_assemblies/IRGSPRef.fa [main] ERROR net.maizegenetics.phgv2.cli.AgcCompress 2024-10-21 04:28:15,994: agc create run via ProcessBuilder returned error code 132 [main] ERROR net.maizegenetics.phgv2.cli.AgcCompress 2024-10-21 04:28:15,994: Error: could not create agc compressed file. Exception in thread "main" java.lang.IllegalArgumentException: Error running ProcessBuilder for agc create or append: Error running ProcessBuilder to compress agc files: 132 at net.maizegenetics.phgv2.cli.AgcCompress.loadAGCFiles(AgcCompress.kt:214) at net.maizegenetics.phgv2.cli.AgcCompress.processAGCFiles(AgcCompress.kt:123) at net.maizegenetics.phgv2.cli.AgcCompress.run(AgcCompress.kt:76) at com.github.ajalt.clikt.parsers.Parser.parse(Parser.kt:306) at com.github.ajalt.clikt.parsers.Parser.parse(Parser.kt:319) at com.github.ajalt.clikt.parsers.Parser.parse(Parser.kt:40) at com.github.ajalt.clikt.core.CliktCommand.parse(CliktCommand.kt:458) at com.github.ajalt.clikt.core.CliktCommand.parse$default(CliktCommand.kt:455) at com.github.ajalt.clikt.core.CliktCommand.main(CliktCommand.kt:475) at com.github.ajalt.clikt.core.CliktCommand.main(CliktCommand.kt:482) at net.maizegenetics.phgv2.cli.PhgKt.main(Phg.kt:38) Command exited with non-zero status 1

Here is the agc_create_error.log content

/tmp/tmpza56jx6g: line 3: 71174 Illegal instruction (core dumped) agc create -i data/assemblies_list.txt -o vcf_dbs/assemblies.agc output/updated_assemblies/IRGSPRef.fa

Previous command: It had no error as I redid the prepare-assemblies

[main] INFO net.maizegenetics.phgv2.cli.PrepareAssemblies 2024-10-08 04:51:36,768: creating assembliesList, calling createParallelAnnotatedFastas [main] INFO net.maizegenetics.phgv2.cli.PrepareAssemblies 2024-10-08 04:51:36,840: Adding entries to the inputChannel: [main] INFO net.maizegenetics.phgv2.cli.PrepareAssemblies 2024-10-08 04:51:36,842: adding IRGSPRef to the inputChannel [main] INFO net.maizegenetics.phgv2.cli.PrepareAssemblies 2024-10-08 04:51:36,843: adding SeqAzucena to the inputChannel [main] INFO net.maizegenetics.phgv2.cli.PrepareAssemblies 2024-10-08 04:51:36,843: adding SeqIR64 to the inputChannel [main] INFO net.maizegenetics.phgv2.cli.PrepareAssemblies 2024-10-08 04:51:36,843: Done adding data to the inputChannel annotateFasta: entry = IRGSPRef annotateFasta: entry = SeqIR64 annotateFasta: entry = SeqAzucena

Files used

Here are the txt files used to annotate.

annotation_keyfile.txt assemblies_list.txt

Other Files can be downloaded through these links

Reference Genome: Nipponbare IRGSP SNP REF Download Link: https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/001/433/935/GCF_001433935.1_IRGSP-1.0/GCF_001433935.1_IRGSP-1.0_genomic.fna.gz Anchorwave GFF file: Nipponbare IRGSP SNP GFF Download Link: https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/001/433/935/GCF_001433935.1_IRGSP-1.0/GCF_001433935.1_IRGSP-1.0_genomic.gff.gz Short Sequence Variants(1): Azucena SNP Download Link: https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/009/830/595/GCA_009830595.1_AzucenaRS1/GCA_009830595.1_AzucenaRS1_genomic.fna.gz Short Sequence Variants(2): IR 64 SNP Download Link: https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/009/914/875/GCA_009914875.1_OsIR64RS1/GCA_009914875.1_OsIR64RS1_genomic.fna.gz

Thank you!

Expected behavior

Creation of the assemblies.agc in the vcf_dbs directory

PHG version

2.4.7.161

aberthel commented 3 weeks ago

It looks like AGC didn't install properly from the conda environment. You can confirm this by trying to run an AGC command via the command line in thephgv2-conda environment, eg. agc create ref.fa in1.fa in2.fa > col.agc I expect you'll get the same "Illegal instruction (core dumped)" error message.

What platform and architecture are you working on? There isn't a Bioconda build released for ARM (M-series) Macs yet, but you can build AGC from source if you need to work on an ARM machine.

pmvijar commented 2 weeks ago

Hello just checked, and right it did not install correctly. I was using a x64 architecture in ubuntu. The error points to the AVX expressions in the cpu. I found it in this issue page: https://github.com/refresh-bio/agc/issues/2. Do note that I have not verified this to be the case

I am still finding a workaround as of the moment

pmvijar commented 2 weeks ago

Hello I found a way to circumvent agc problem for now by using another computer, but I experienced another error now in create ranges. Would this be better if i made it another issue? Does this have to do with the files I have used?

For now this is the issue:

java.lang.IllegalArgumentException: createFlankingList: chrom NC_001320.1 not found in reference fasta. at net.maizegenetics.phgv2.utils.GeneralUtilitiesKt.createFlankingList(GeneralUtilities.kt:88) at net.maizegenetics.phgv2.cli.CreateRanges.run(CreateRanges.kt:345) at com.github.ajalt.clikt.parsers.Parser.parse(Parser.kt:279) at com.github.ajalt.clikt.parsers.Parser.parse(Parser.kt:292) at com.github.ajalt.clikt.parsers.Parser.parse(Parser.kt:41) at com.github.ajalt.clikt.core.CliktCommand.parse(CliktCommand.kt:457) at com.github.ajalt.clikt.core.CliktCommand.parse$default(CliktCommand.kt:454) at com.github.ajalt.clikt.core.CliktCommand.main(CliktCommand.kt:474) at com.github.ajalt.clikt.core.CliktCommand.main(CliktCommand.kt:481) at net.maizegenetics.phgv2.cli.PhgKt.main(Phg.kt:58) Exception in thread "main" java.lang.IllegalArgumentException: ERROR - createFlankingList faulted with message: createFlankingList: chrom NC_001320.1 not found in reference fasta. at net.maizegenetics.phgv2.utils.GeneralUtilitiesKt.createFlankingList(GeneralUtilities.kt:108) at net.maizegenetics.phgv2.cli.CreateRanges.run(CreateRanges.kt:345) at com.github.ajalt.clikt.parsers.Parser.parse(Parser.kt:279) at com.github.ajalt.clikt.parsers.Parser.parse(Parser.kt:292) at com.github.ajalt.clikt.parsers.Parser.parse(Parser.kt:41) at com.github.ajalt.clikt.core.CliktCommand.parse(CliktCommand.kt:457) at com.github.ajalt.clikt.core.CliktCommand.parse$default(CliktCommand.kt:454) at com.github.ajalt.clikt.core.CliktCommand.main(CliktCommand.kt:474) at com.github.ajalt.clikt.core.CliktCommand.main(CliktCommand.kt:481) at net.maizegenetics.phgv2.cli.PhgKt.main(Phg.kt:58)

lynnjo commented 2 weeks ago

Yes, problems with different parts of the code should be written as separate issues.

The error above shows an inconsistency in your files. It appears you either have the wrong reference fasta, or wrong gff. The naming convention in your reference fasta does not match what is in the gff. Please check your files and open a new issue if you still see a problem - thanks.

aberthel commented 2 weeks ago

@pmvijar regarding the original AGC instructions issue, if you're willing to compile the binary yourself you should be able to get it working on the first computer. AGC provides a "no-avx" makefile - or alternatively, using the standard makefile with PLATFORM=SSE2 may also work. These more specialized builds aren't available through conda, unfortunately, so it's up to you whether it's worth the effort to compile them for this machine.

pmvijar commented 2 weeks ago

The files errors were resolved.

For the agc, I'll be migrating the data to a compatible server.

Thank you!~