hemberg-lab / MicroExonator

Snakemake pipeline for microexon discovery and quantification
19 stars 8 forks source link

Missing config.yaml Reference Files in Documentation, Analyses #3

Closed cburghard closed 4 years ago

cburghard commented 4 years ago

It seems the analysis strongly depends on the annotation files provided in config.yaml. In the readme, the origin of some of the annotation files is vague. I would appreciate clarification on the following:

From the README: "ME_DB is a path to a known Microexon database such as VAST DB (is this optional or no?)"

As far as I can tell, VASTDB does not directly provide a bed12 of all microexons. Is it possible to link or provide the list of microexons? How does this file change the analysis? Is it actually optional?

Similarly for "Gene_anontation_bed12" the UCSC table browser link defaults to Gencode basic v19. How would using the GENCODE basic vs comprehensive annotations affect the analysis?

geparada commented 4 years ago

Hello cburghard,

Indeed, the analysis does depend on the annotation files you provide at the config.yaml. I will improve the documentation in the README file to clarify a bit more about the origin of these files.

ME_DB correspond to an optional file, but it could help to enhance the detection rates of microexons that are already annotated. Getting the bed12 files from VAST DB might be quite elusive, so I will try to find a way to provide an easier solution, but at least the way I am getting these files is by connecting to their UCSC hub (http://vastdb.crg.eu/tracks/VastDBhub/hub.txt). You can do this by inserting their hub link at https://genome.ucsc.edu/cgi-bin/hgHubConnect and then you can use the Table Browser to retrieve their track as bed12 files. MicroExonator scans for microexons that are annotated on these files, but I am also planning to enable the users to input standard bed files that can directly indicate microexon coordinates.

Regarding the link to the UCSC table browser, I did not intend to set Gencode basic v19 as the default value for the link, I think these are the default values that everybody gets when accessing the Table Browser. I recommend you to use the most comprehensive annotation available for the genome assembly of your interest. Using the most complete gene annotation possible will enable MicroExonator to interrogate more splice junctions to find novel microexons, so GENCODE comprehensive would be better than basic. In fact, I have been using the GENCODE comprehensive for all the analysis done in mouse and human.

Thanks for the feedback and please let me know if you have other questions or issues.

geparada commented 4 years ago

These issues are now addressed on the new documentation: https://microexonator.readthedocs.io/en/latest/discovery_and_quantification.html

Thanks a lot for your feedback.

zyh4482 commented 3 months ago

Is the bed12 from VASTDB issue fixed now?