liampshaw / mobile-gene-regions

Analysing the genomic context of mobile genes
4 stars 0 forks source link

Issues with running the pipeline #14

Open bhargava-morampalli opened 8 months ago

bhargava-morampalli commented 8 months ago

I have been having issues with running the pipeline - mostly due to running the pan graph from the pipeline. I could run pan graph separately but even after modifying the script (calling pan graph using Julia package caller)- it keeps throwing issues that it cannot access the Julia package dependencies to run pan graph. I kept loading in the packages again right when I was running the pipeline but now, I got stuck with an error relating to 'version.Jl' - Nothing is not callable. Please let me know if there is correct way to install Pangraph so it can work with the pipeline properly.

Thank you.

liampshaw commented 8 months ago

Thanks for flagging this issue - apologies that you've had difficulties running the pipeline. I'm not immediately sure what is going on here but I'll do my best to look into it this week and get back to you.

Do you have some example data files so I can experiment trying to reproduce the error? (seems unlikely that it's input file related though, from what you say) And what system are you running on?

bhargava-morampalli commented 8 months ago

Hello, thanks for getting back. I just tried reinstalling pangraph from the branch mentioned in the repository and it executed without any errors. I compiled from the local binaries. When I run the pipeline this is the error I get. Screenshot 2023-11-03 at 1 40 55 pm I am using the Julia version 1.7.2 mentioned in the pan graph repository. I am also including the example files so you can try replicating the issue. the command I used:

python analyze-flanking-regions.py \
    --contigs input/contigs/imp4_contigs.fa \
    --gene_fasta input/focal_genes/imp4.fa \
    --focal_gene_name imp4 \
    --gff input/gffs/imp4_annotations.gff \
    --output_dir output

Please let me know if you need any more information

liampshaw commented 8 months ago

Thanks for the details. Apologies that the details about needing Julia 1.7.2 weren't clear from this repo.

From looking at your output, this line:

16 contigs in input fasta
0 regions extracted (from contigs with one blast hit for gene)

suggests that the extracted regions fasta being passed to pangraph is empty. The attempt to access 0-element vector is the error pangraph throws when this happens. (TO DO: pipeline should exist with a clear warning if no contigs pass the filtering criterion at this stage, rather than giving this cryptic error)

I don't know exactly why the extracted regions fasta is empty. If you send your input files I'd be very happy to try running them myself. feel free to email liam.philip.shaw@gmail.com if you don't want to post them here :)