hartwigmedical / hmftools

Various algorithms for analysing genomics data
GNU General Public License v3.0
186 stars 58 forks source link

purple initialisation error #350

Closed lichennan123 closed 1 year ago

lichennan123 commented 1 year ago

Hello Hartwig team,

Thank you for making such an excellent toolkit! I successfully ran cobalt, amber, and gridss/gripss using my wgs data. However, running the latest version of PURPLE (v. 3.7.2) always threw me an error 'initialisation error, exiting'. And I did not find any details about what was going wrong. What is interesting is that an older version (v 3.1) could work without the same problem using the same set of input files (though different versions seem to require slightly different naming of input files). So I wonder if you have any idea about what this initialisation error suggests and whether you have any suggestions upon this issue. Thank you.

Sincerely, Chennan

charlesshale commented 1 year ago

Hi Chennan,

Could you upload the command that you're using to call Purple?

thanks,

Charles.

lichennan123 commented 1 year ago

Hi Charles,

Here is the code that I used -

ml java R/4.1
export JAR=/data/lic27/tools/purple/purple_v3.7.2.jar # the latest version .jar file
export AMBER=/data/lic27/project_clon_evo/practice/27_linx/02_amber
export COBALT=/data/lic27/project_clon_evo/practice/27_linx/01_cobalt
export GC=/data/lic27/project_clon_evo/practice/tools/GC_profile.hg38.1000bp.cnp # same as the gc profile used in cobalt
export REF=/fdb/GATK_resource_bundle/hg38-v0/Homo_sapiens_assembly38.fasta
export CIRCOS=/data/lic27/tools/circos/circos-0.69-9/bin/circos
export OUT=/data/lic27/project_clon_evo/practice/27_linx/04_purple/
export SV=/data/lic27/project_clon_evo/practice/27_linx/031_gripss/DTB-090-PRO.gripss.filtered.vcf.gz
export ALLSV=/data/lic27/project_clon_evo/practice/27_linx/031_gripss/DTB-090-PRO.gripss.vcf.gz
export ENSEMBL=/fdb/ensembl/pub/release-108/fasta/homo_sapiens/dna/ # an ensembl data directory for hg38
java -jar $JAR \
   -reference DTB-090-N \
   -tumor DTB-090-PRO \
   -amber $AMBER \
   -cobalt $COBALT \
   -gc_profile $GC \
   -ref_genome $REF \
   -ref_genome_version 38 \
   -ensembl_data_dir $ENSEMBL \
   -structural_vcf $SV \
   -sv_recovery_vcf $ALLSV \
   -circos $CIRCOS \
   -output_dir $OUT

The reference/tumor names are consistent with those used in prior steps. Let me know if you need clarification on anything! Thanks for helping.

Chennan

bobrad98 commented 1 year ago

Hi Chennan,

I'm having the same problem as you, did you manage to find the solution?

lichennan123 commented 1 year ago

Yes! I ended up using an older version and it worked very well. Here is the code -

ml java R/4.1
export JAR=/data/lic27/tools/purple/purple_v3.1.jar  # version 3.1
export AMBER=/data/lic27/project_clon_evo/practice/27_linx/02_amber
export COBALT=/data/lic27/project_clon_evo/practice/27_linx/01_cobalt
export GC=/data/lic27/project_clon_evo/practice/tools/GC_profile.hg38.1000bp.cnp
export REF=/fdb/GATK_resource_bundle/hg38-v0/Homo_sapiens_assembly38.fasta
export CIRCOS=/data/lic27/tools/circos/circos-0.69-9/bin/circos
export OUT=/data/lic27/project_clon_evo/practice/27_linx/04_purple/
export SV=/data/lic27/project_clon_evo/practice/27_linx/031_gripss/DTB-090-PRO.gripss.filtered.vcf.gz
export ALLSV=/data/lic27/project_clon_evo/practice/27_linx/031_gripss/DTB-090-PRO.gripss.vcf.gz
export SOMATIC=/data/lic27/project_clon_evo/practice/17_mutect2_pon/DTB-090-PRO/DTB-090-PRO.MuTect2.pass.vcf

java -jar $JAR \
   -reference DTB-090-N \
   -tumor DTB-090-PRO \
   -amber $AMBER \
   -cobalt $COBALT \
   -gc_profile $GC \
   -ref_genome $REF \
   -ref_genome_version 38 \
   -structural_vcf $SV \
   -sv_recovery_vcf $ALLSV \
   -somatic_vcf $SOMATIC \
   -circos $CIRCOS \
   -output_dir $OUT

Good luck! Chennan

charlesshale commented 1 year ago

Could you paste me the log from Purple up to when it hits the initialisation error?

There should be no need to revert to any earlier version of Purple.

What are the contents for: /fdb/ensembl/pub/release-108/fasta/homo_sapiens/dna/

?

These should be the Hartwig Ensembl files from our resource page.

thanks.

bobrad98 commented 1 year ago

Hi Charles,

I'm having the same problem as Chennan, so I wanted to ask you a couple of things: 1) Is this&prefix=&forceOnObjectsSortingFiltering=false) the Ensembl data which should be used? 2) After digging through the code, I found that versions > 3.3 and < 3.3 differ in the IsValid method of [ReferenceData] (https://github.com/hartwigmedical/hmftools/blob/46aa145e83f422a25412e3918d1aff7203c38c10/purple/src/main/java/com/hartwig/hmftools/purple/config/ReferenceData.java#L251):

older - public boolean isValid() { return mIsValid } newer - public boolean isValid() { return mIsValid && TargetRegions.isValid(); }

I think this is important for this discussion because all of the older versions pass the point in the execution where Chennan and I got stuck, but all of the newer versions failed at that step.

Can you please take a look at this and see if I'm missing anything?

Thanks!

lichennan123 commented 1 year ago

Could you paste me the log from Purple up to when it hits the initialisation error?

There should be no need to revert to any earlier version of Purple.

What are the contents for: /fdb/ensembl/pub/release-108/fasta/homo_sapiens/dna/

?

These should be the Hartwig Ensembl files from our resource page.

thanks.

Here you go. It does not contain useful info for debugging...

10:39:34 - [INFO ] - Purple version: 3.7.2
10:39:34 - [INFO ] - output directory: /data/lic27/project_clon_evo/practice/27_linx/04_purple/
10:39:34 - [INFO ] - reference(DTB-090-N) tumor(DTB-090-PRO)
10:39:34 - [INFO ] - using ref genome: V38
10:39:34 - [ERROR] - initialisation error, exiting

About the ensembl data cache, I believe I wasn't able to find that in the link attached in the github page. Since the old version that does not require this information works for me, I am good. But can you maybe attach the path through which everyone can access the ensembl data cache for anyone who would like to use? Thank you!

Chennan

lichennan123 commented 1 year ago

Hi Charles and Bozidar,

When I looked at my previous script used for running linx, I realized that the ensembl path came from google cloud (that is probably attached in the github page for linx, rather than for purple). When I used that on the latest version of purple, it worked nicely. I would recommend that all the relevant links to the DNA resource updated to the Google cloud so that anyone using these wonderful tools will no longer run into the same issue. Thank you!

Chennan