Metadata requirements - Githubissues

lgatto commented 9 years ago

From the vignette

The Recipe is a short function, typically named NameOfDataOrigformatToFinalformat, that converts the original data into on compatible with R/Bioconductor.
Location_Prefix is either .amazonBaseUrl, when the file to be loaded/read by the user exists on the AH Amazon S3 instance, or .prideBaseUrl when it lives on the PRIDE ftp server.
SourceURL is the full location of the original file. This is generally the third-party server, but not necessarily.
RDataPath is the path and filename of the file to be read into R and provided to the user. This field does not contain the server address .prideBaseUrl or .amazonBaseUrl (see (Location_Prefix).
The metadata list, used to create the AnnotationHubResources also uses a SourceBaseUrl, which is the full url minus file name (that is in File) of the original file. Used to construct SourceUrl.

@sonali-bioc before checking the issues for the individual files, could you check the following, to make sure I got it right.

sonali-bioc commented 9 years ago

@lgatto - yes the above is correct. Only change to Recipe is that it is not necessary to create the original data into compatible R/Bioconductor. For some fasta files, the recipe function uses Rsamtools::indexFa to create an index file for this fasta files. But most of the times, it is true.

lgatto commented 9 years ago

Thanks.

sonali-bioc commented 9 years ago

FYI - Title looks good now..

Did not want to pollute the other issues - so adding it here.

> library(AnnotationHub)
> ah = AnnotationHub()
updating AnnotationHub metadata: retrieving 1 resource
  |======================================================================| 100%
snapshotDate(): 2015-07-30
There were 50 or more warnings (use warnings() to see the first 50)
> length(ah)
[1] 34809
> tail(ah)
AnnotationHub with 6 records
# snapshotDate(): 2015-07-30
# $dataprovider: PRIDE, ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/
# $species: Erwinia carotovora, Lactobacillus jensenii_JV-V16, Methanocaldoc...
# $rdataclass: OrgDb, AAStringSet, MSnSet, mzRident, mzRpwiz
# additional mcols(): taxonomyid, genome, description, tags, sourceurl,
#   sourcetype
# retrieve records with, e.g., 'object[["AH49004"]]'

            title
  AH49004 | org.Methanocaldococcus_infernus_ME.eg.sqlite
  AH49005 | org.Lactobacillus_jensenii_JV-V16.eg.sqlite
  AH49006 | PXD000001: Erwinia carotovora and spiked-in protein fasta file
  AH49007 | PXD000001: Peptide-level quantitation data
  AH49008 | PXD000001: raw mass spectrometry data
  AH49009 | PXD000001: MS-GF+ identiciation data

lgatto commented 9 years ago

Excellent!

lgatto / ProteomicsAnnotationHubData

Metadata requirements #9