mulinlab / VarNote

Fast and scalable variant annotation tool
http://mulinlab.org/varnote
BSD 3-Clause "New" or "Revised" License
30 stars 9 forks source link

VarNote-REG VarNote-PAT command line options #14

Open j2moreno opened 3 years ago

j2moreno commented 3 years ago

Wondering if there is a VarNote-REG and VarNote-PAT command line jar option? I don't see anything when using just the VarNote.jar file

$ java -jar VarNote-1.1.0.jar 
USAGE: java -jar /path/to/VarNote.jar <program name> [-h]

Program Summary Table:
--------------------------------------------------------------------------------------
VarNote Annotation:                              Tools that identifies desired annotation fields from database(s).
    Annotation                                   To quickly extract (by random-sweep algorithm) desired annotation fields from indexed annotation database(s) given query intervals/variants.
    AnnotationConfig                             Run VarNote Annotation program with a config file. 
    AnnotationIntersectFile                      To quickly extract desired annotation fields from an existing VarNote intersection file. 

--------------------------------------------------------------------------------------
VarNote Index:                                   Tools that generates index for the compressed database file.
    Index                                        To generate VarNote index (".vanno" and ".vanno.vi") for compressed (block gzip) annotation database file.
    IndexInfo                                    To query related information (such as header, format, meta information or sequence name) stored in the VarNote index file of each annotation database.

--------------------------------------------------------------------------------------
VarNote Query:                                   Tools that quickly retrieve data lines from database(s).
    Count                                        To quickly count intersected records in annotation database(s).
    Intersect                                    To quickly retrieve (by random-sweep algorithm) intersected records from indexed annotation database(s) given query intervals/variants.
    IntersectConfig                              Run VarNote Intersect program with a config file. 
    RandomAccess                                 To quickly retrieve (by independent random access) intersected records from indexed annotation database(s) given a genomic region like "chrN:beginPos-endPos".

--------------------------------------------------------------------------------------
mulinlab commented 3 years ago

Hi, the major reason we didn’t provide local version of varnote-reg,pat,can is that they reply on several large-scale annotations. However, we are working on the implementation and hopefully we will release the local version very soon together with packaged annotations for download.

mulinlab commented 3 years ago

Wondering if there is a VarNote-REG and VarNote-PAT command line jar option? I don't see anything when using just the VarNote.jar file

$ java -jar VarNote-1.1.0.jar 
USAGE: java -jar /path/to/VarNote.jar <program name> [-h]

Program Summary Table:
--------------------------------------------------------------------------------------
VarNote Annotation:                              Tools that identifies desired annotation fields from database(s).
    Annotation                                   To quickly extract (by random-sweep algorithm) desired annotation fields from indexed annotation database(s) given query intervals/variants.
    AnnotationConfig                             Run VarNote Annotation program with a config file. 
    AnnotationIntersectFile                      To quickly extract desired annotation fields from an existing VarNote intersection file. 

--------------------------------------------------------------------------------------
VarNote Index:                                   Tools that generates index for the compressed database file.
    Index                                        To generate VarNote index (".vanno" and ".vanno.vi") for compressed (block gzip) annotation database file.
    IndexInfo                                    To query related information (such as header, format, meta information or sequence name) stored in the VarNote index file of each annotation database.

--------------------------------------------------------------------------------------
VarNote Query:                                   Tools that quickly retrieve data lines from database(s).
    Count                                        To quickly count intersected records in annotation database(s).
    Intersect                                    To quickly retrieve (by random-sweep algorithm) intersected records from indexed annotation database(s) given query intervals/variants.
    IntersectConfig                              Run VarNote Intersect program with a config file. 
    RandomAccess                                 To quickly retrieve (by independent random access) intersected records from indexed annotation database(s) given a genomic region like "chrN:beginPos-endPos".

--------------------------------------------------------------------------------------

Hi, we have released new version of VarNote 1.2.0. In this version, we implement three local pipelines for the prioritization of genome-scale regulatory variants, including VarNote-REG (GWAS, complex disease), VarNote-PAT (WGS, inherited disease) and VarNote-CAN (WGS/targeted sequencing, Cancer), to facilitate researchers to execute the job locally

In addition, we also add several programs to VarNote Toolkits for efficiently processing genetic variant information, such as format conversion, LD calculation, etc.

Please try to use it and report any issues to us, thanks.

j2moreno commented 3 years ago

Thanks for this update!

I am now experiencing difficulty retrieving the full databases needed to run these tools. http://8zpsi7iiqn.51xd.pub/VarNoteDB/

It takes me to an empty URL.

mulinlab commented 3 years ago

Thanks for this update!

I am now experiencing difficulty retrieving the full databases needed to run these tools. http://8zpsi7iiqn.51xd.pub/VarNoteDB/

It takes me to an empty URL.

Hi, Moreno,

We have changed to a new host now, please try it again.

mulinlab commented 3 years ago

Also, since the full version of database takes huge volume of disk and network, our server may not support stable download. We are planning to move these files to Google Drive.

draeath commented 3 years ago

We are planning to move these files to Google Drive

That sounds expensive.


I have a request on my end to furnish these files for some research staff. There seems to be no md5 or sha* sums available to validate the files. Is this something you have and could put up on the webserver?

mulinlab commented 3 years ago

We are planning to move these files to Google Drive

That sounds expensive.

I have a request on my end to furnish these files for some research staff. There seems to be no md5 or sha* sums available to validate the files. Is this something you have and could put up on the webserver?

Thanks for the suggestions, we will add MD5 sums then.

draeath commented 3 years ago

Hi folks,

This is a weird one. I'm running into wget: memory exhausted while trying to mirror the contents of 2792wttzz8.xuduan.vip/VarNoteDB/hg38/ (despite this being a compute node with, at the moment, 45gb of available memory). I suspect it has something to do with the webserver-generated index files, but I'm not sure.

Is there any way to get a plain-text listing of the tree? Something like the output of this would be perfect:

cd <wherever the VarNoteDB/ docroot is>
find . -xdev -type f

Such a listing would allow me to pull each file individually without having to rely on wget's mirroring logic.

mulinlab commented 3 years ago

Hi folks,

This is a weird one. I'm running into wget: memory exhausted while trying to mirror the contents of 2792wttzz8.xuduan.vip/VarNoteDB/hg38/ (despite this being a compute node with, at the moment, 45gb of available memory). I suspect it has something to do with the webserver-generated index files, but I'm not sure.

Is there any way to get a plain-text listing of the tree? Something like the output of this would be perfect:

cd <wherever the VarNoteDB/ docroot is>
find . -xdev -type f

Such a listing would allow me to pull each file individually without having to rely on wget's mirroring logic.

./VarNoteDB_AF_gnomAD_Genome/VarNoteDB_AF_gnomAD_Genome.vcf.gz ./VarNoteDB_AF_gnomAD_Genome/VarNoteDB_AF_gnomAD_Genome.vcf.gz.vanno ./VarNoteDB_AF_gnomAD_Genome/VarNoteDB_AF_gnomAD_Genome.vcf.gz.tbi ./VarNoteDB_AF_gnomAD_Genome/VarNoteDB_AF_gnomAD_Genome.vcf.gz.vanno.vi ./VarNoteDB_FA_dbNSFP/VarNoteDB_FA_dbNSFP.gz.vanno ./VarNoteDB_FA_dbNSFP/VarNoteDB_FA_dbNSFP.gz ./VarNoteDB_FA_dbNSFP/VarNoteDB_FA_dbNSFP.gz.tbi ./VarNoteDB_FA_dbNSFP/VarNoteDB_FA_dbNSFP.gz.vanno.vi ./VarNoteDB_FA_regBase_prediction/VarNoteDB_FA_regBase_prediction.gz ./VarNoteDB_FA_regBase_prediction/VarNoteDB_FA_regBase_prediction.gz.vanno.vi ./VarNoteDB_FA_regBase_prediction/VarNoteDB_FA_regBase_prediction.gz.vanno ./VarNoteDB_Reference/Gene/hg38_ensembl.ser ./VarNoteDB_Reference/Gene/hg38_refseq.ser ./VarNoteDB_Reference/Gene/hg38_ucsc.ser ./VarNoteDB_Reference/LD/1kg.phase3.v5.shapeit2.afr.hg38.all.split.multi.vcf.gz.bit ./VarNoteDB_Reference/LD/1kg.phase3.v5.shapeit2.afr.hg38.all.split.multi.vcf.gz.bit.idx ./VarNoteDB_Reference/LD/1kg.phase3.v5.shapeit2.amr.hg38.all.split.multi.vcf.gz.bit ./VarNoteDB_Reference/LD/1kg.phase3.v5.shapeit2.amr.hg38.all.split.multi.vcf.gz.bit.idx ./VarNoteDB_Reference/LD/1kg.phase3.v5.shapeit2.eas.hg38.all.split.multi.vcf.gz.bit ./VarNoteDB_Reference/LD/1kg.phase3.v5.shapeit2.eas.hg38.all.split.multi.vcf.gz.bit.idx ./VarNoteDB_Reference/LD/1kg.phase3.v5.shapeit2.eur.hg38.all.split.multi.vcf.gz.bit ./VarNoteDB_Reference/LD/1kg.phase3.v5.shapeit2.eur.hg38.all.split.multi.vcf.gz.bit.idx ./VarNoteDB_Reference/LD/1kg.phase3.v5.shapeit2.sas.hg38.all.split.multi.vcf.gz.bit ./VarNoteDB_Reference/LD/1kg.phase3.v5.shapeit2.sas.hg38.all.split.multi.vcf.gz.bit.idx ./VarNoteDB_Reference/LD/ld_block/AFR_ld.block ./VarNoteDB_Reference/LD/ld_block/EAS_ld.block ./VarNoteDB_Reference/LD/ld_block/EUR_ld.block ./VarNoteDB_Reference/LD/ser/hg38_ensembl.ser ./VarNoteDB_Reference/LD/ser/hg38_refseq.ser ./VarNoteDB_Reference/LD/ser/hg38_ucsc.ser ./VarNoteDB_FA_CellTypeScore/fitcons2.merge.gz ./VarNoteDB_FA_CellTypeScore/fitcons2.merge.gz.vanno ./VarNoteDB_FA_CellTypeScore/fitcons2.merge.gz.vanno.vi ./VarNoteDB_FA_CellTypeScore/FUN-LDA.merge.gz ./VarNoteDB_FA_CellTypeScore/FUN-LDA.merge.gz.vanno ./VarNoteDB_FA_CellTypeScore/FUN-LDA.merge.gz.vanno.vi ./VarNoteDB_FA_CellTypeScore/GenoNet_all.bgz ./VarNoteDB_FA_CellTypeScore/GenoNet_all.bgz.vanno ./VarNoteDB_FA_CellTypeScore/GenoNet_all.bgz.vanno.vi ./VarNoteDB_FA_CellTypeScore/GenoSkylinePlus.merge.gz ./VarNoteDB_FA_CellTypeScore/GenoSkylinePlus.merge.gz.vanno ./VarNoteDB_FA_CellTypeScore/GenoSkylinePlus.merge.gz.vanno.vi ./VarNoteDB_FA_regBase/VarNoteDB_FA_regBase.gz ./VarNoteDB_FA_regBase/VarNoteDB_FA_regBase.gz.vanno ./VarNoteDB_FA_regBase/VarNoteDB_FA_regBase.gz.vanno.vi ./VarNoteDB_FA_regBase/VarNoteDB_FA_regBase.gz.tbi ./VarNoteDB_FP_Roadmap_127Epi/VarNoteDB_FP_Roadmap_127Epi.bed.gz.vanno ./VarNoteDB_FP_Roadmap_127Epi/VarNoteDB_FP_Roadmap_127Epi.bed.gz.tbi ./VarNoteDB_FP_Roadmap_127Epi/VarNoteDB_FP_Roadmap_127Epi.bed.gz ./VarNoteDB_FP_Roadmap_127Epi/VarNoteDB_FP_Roadmap_127Epi.bed.gz.vanno.vi ./VarNoteDB_TA_COSMIC_NonCoding/VarNoteDB_TA_COSMIC_NonCoding.vcf.gz.vanno.vi ./VarNoteDB_TA_COSMIC_NonCoding/VarNoteDB_TA_COSMIC_NonCoding.vcf.gz.tbi ./VarNoteDB_TA_COSMIC_NonCoding/VarNoteDB_TA_COSMIC_NonCoding.vcf.gz ./VarNoteDB_TA_COSMIC_NonCoding/VarNoteDB_TA_COSMIC_NonCoding.vcf.gz.vanno

GitBioinformatics commented 2 months ago

Not Found The requested URL was not found on this server.

Apache/2.4.41 (Ubuntu) Server at 2792wttzz8.xuduan.vip Port 80