Open holtgrewe opened 2 years ago
Below are some notes on what we will need.
precondition: server and project correctly configured, storage setup
Phenopackets documentation
This includes:
x-xy-karyotype
for gonosomal karyotypeGA4GH Pedigree standard
Sketch
Phenopacket capturing most aspects. Phenopacket authors recommend to use file annotation for capturing sequencing details.
# A VarFish Case corresponds to a Phenopacket family.
family:
# The identifier is automatically set to the SODAR UUID when created.
id: $CASE_SODAR_UUID
# The proband and relatives must match the definitions in pedigree
# below.
proband:
id: $INDEX_NAME
subject:
id: $INDEX_NAME
sex: MALE # FEMALE / OTHER_SEX / UNKNOWN_SEX
karyotypic_sex: UNKNOWN_KARYOTYPE # cf. https://phenopacket-schema.readthedocs.io/en/latest/karyotypicsex.html#rstkaryotypicsex
phenotypicFeatures:
- type:
id: "HP:0012469"
label: "Infantile spasms"
excluded: false
modifiers:
- id: "HP:0031796"
label: "Recurrent"
measurements:
# we only allow one measurement in VarFish 2.0
- description: WGS / WES / Panel-seq of
assay:
id: NCIT:C158253
label: Targeted Genome Sequencing
# alternative #1
#id: NCIT:C101295
#label: Whole Exome Sequencing
# alternative #2
#id: NCIT:C101294
#label: Whole Genome Sequencing
measurement_value:
value:
id: NCIT:C171177
label: Sequencing Data File
time_observed: # optional
timestamp: the timestamp
diseases:
- term: OMIM:xxx
excluded: false
files: # FILES FOR PROBAND
# Use s3:// URL with path to targets to identify the enrichment kit.
- uri: s3://...
individualToFileIdentifiers:
IDENTIFIER_INDEX: identifier in file
fileAttributes:
genomeAssembly: GRCh38 # GRCh37
fileFormat: vcf # BAM etc.
description: free-text description
# TODO: more file examples, possibly extended attributes
metadata: # COPY AND PASTE FROM BELOW
relatives:
- # ... list of phenopackets
pedigree:
persons:
- familyId: FAM
individualId: IDENTIFIER_INDEX
patenralId: 0
maternalId: 0
sex: MALE
affectedStatus: UNAFFECTED
files: # FILES FOR WHOLE FAMILY
- uri: ...
metadata:
created: 2019-07-21T00:25:54.662Z
createdBy: Peter R.
resources:
- id: hp
name: human phenotype ontology
url: http://purl.obolibrary.org/obo/hp.owl
version: 2018-03-08
namespacePrefix: HP
iriPrefix: hp
- id: geno
name: Genotype Ontology
url: http://purl.obolibrary.org/obo/geno.owl
version: 19-03-2018
namespacePrefix: GENO
iriPrefix: geno
- id: pubmed
name: PubMed
url: https://www.ncbi.nlm.nih.gov/pubmed/
namespacePrefix: PMID
- id: orphanet
name: orphanet rare disease ontology
url: http://www.orpha.net/
namespacePrefix: ORPHA
iriPrefix: orpha
- id: omim
name: Online Mendelian Inheritance in Man
url: http://www.omim.org/
namespacePrefix: OMIM
iriPrefix: omim
- id: ncit
name: National Cancer Institute Thesaurus
url: https://bioportal.bioontology.org/ontologies/NCIT/
namespacePrefix: NCIT
iriPrefix: ncit
phenopacketSchemaVersion: 2.0
Case
model will be changed to match the Phenopacket subset that we aim to support. We will replace the CaseImportInfo
record with a CaseAction
model. Mass data will be stored outside of the database. This could be an S3 storage that the user has read/write and VarFish has read access to. VarFish will store any data in an internal storage, e.g., an internal S3 storage.
Case
creation
CaseAction
with ACTION=CREATE => STATE=DRAFT, maybe as clone of existingCaseAction
in STATE=DRAFTCase
update
CaseAction
with ACTION=UPDATE => STATE=DRAFT, maybe as clone of existingCaseAction
in STATE=DRAFTCaseAction
with STATE=DRAFT => STATE=SUBMITTEDCase
deletion
CaseAction
with ACTION=DELETE => STATE=DRAFT, maybe as clone of existingCaseAction
in STATE=DRAFT=> STATE=SUBMITTEDCaseAction
states:
Case
states:
CaseAction
in DRAFT or ACTIVE state per caseCaseAction
only valid if Case
state is ACTIVEWe use the family
top level element for phenopackets 2.0.
The following metadata
entry is supported (versions can be adjusted, id/prefixes, urls must stah the same). The same is used in all relevant places.
metadata:
created: $CREATION
createdBy: $CREATOR
resources:
- id: hp
name: human phenotype ontology
url: http://purl.obolibrary.org/obo/hp.owl
version: 2018-03-08
namespacePrefix: HP
iriPrefix: hp
- id: geno
name: Genotype Ontology
url: http://purl.obolibrary.org/obo/geno.owl
version: 19-03-2018
namespacePrefix: GENO
iriPrefix: geno
- id: pubmed
name: PubMed
url: https://www.ncbi.nlm.nih.gov/pubmed/
namespacePrefix: PMID
- id: orphanet
name: orphanet rare disease ontology
url: http://www.orpha.net/
namespacePrefix: ORPHA
iriPrefix: orpha
- id: omim
name: Online Mendelian Inheritance in Man
url: http://www.omim.org/
namespacePrefix: OMIM
iriPrefix: omim
- id: ncit
name: National Cancer Institute Thesaurus
url: https://bioportal.bioontology.org/ontologies/NCIT/
namespacePrefix: NCIT
iriPrefix: ncit
We support the full types of family.pedigree.persons
. The family_id
of all must be the same and the individual_id
must link back to the proband/relatives id
and subject.id
.
family:
pedigree:
persons:
- family_id: FAM
individual_id: IDENTIFIER_INDEX
paternal_id: 0
maternal_id: 0
sex: MALE
affected_status: UNAFFECTED
# ...
The family.proband
and family.relatives
suport the following Phenopackets subset (here for family.proband)
family:
proband:
id: IDENTIFIER_INDEX
subject:
id: IDENTIFIER_INDEX # must match ../../id
sex: # all supported
karyotypic_sex: # all supported
diseases:
- term: # OMIM or Orphanet disease
excluded: false # or true
phenotypic_features:
- type:
id:
label:
excluded: false # or true
modifiers:
- id:
label:
measurements:
# exactly one measurment is allowed
- description: WGS # or WES or Panel-seq
assay:
id: NCIT:C158253
label: Targeted Genome Sequencing
# # alternative #1
# id: NCIT:C101295
# label: Whole Exome Sequencing
# # alternative #2
# id: NCIT:C101294
# label: Whole Genome Sequencing
files:
# MUST define path to enrichment kit unless WGS
- uri: s3://...
file_attributes:
designation: enrichment_kit_targets
genome_assembly: GRCh37 # or GRCh38 MUST be given
file_format: BED
description: free-text description
# MAY contain further files for definining per-sample files
- uri: ...
file_attributes: {} # ...
# use file_format: BAM/CRAM for files alignments
The family.files
list can contain small and structural variant files:
family:
files:
- uri: s3://path/family.gatk_hc.vcf.gz
file_attributes:
designation: seqvars
genome_assembly: GRCh37 # or GRCh38 MUST be given and consistent in case
file_format: VCF
caller: GATK-HC
- uri: s3://path/family.delly2.vcf.gz
file_attributes:
designation: strucvars
genome_assembly: GRCh37 # or GRC38 MUST be given and consistent in case
file_format: VCF
caller: Delly2
- uri: s3://path/family.manta.vcf.gz
file_attributes:
designation: strucvars
genome_assembly: GRCh37 # or GRC38 MUST be given and consistent in case
file_format: VCF
caller: Manta
- uri: s3://path/family.gcnv.vcf.gz
file_attributes:
designation: strucvars
genome_assembly: GRCh37 # or GRC38 MUST be given and consistent in case
file_format: VCF
caller: GATK-gCNV
We introduce the new Django app case_import
for the new functionality.
We introduce the following new endpoints for CaseImportAction
.
/case-import/api/case-import-action/list-create/<project>/[?case=<case>]
-
List all CaseImportAction
objects in a project (optionally, only for a case).
Also, allow creation of new one./case-import/api/case-import-action/retrieve-update/<case-import-action/
-
Retrieve CaseImportAction
.
Also allows updates.
This allows to to update the state of a DRAFT
action to a SUBMITTED
one, for example.
However, illegal state transitions are prevented.Note that the current disease and phenotype information may differ for a case and re-importing it based on a previous import action only will override this information. Users thus have to fetch information from the normal case API.
Is your feature request related to a problem? Please describe. We can currently (only) provide a PED file for describing a case. It would be very helpful to augment this as we currently already store information in VarFish that cannot be encoded in a canonical PLINK PED file. This means we cannot export into the same format that we import from and it would also be nice to import with more information as well.
Describe the solution you'd like Design a "case manifest" format that describes the relevant aspects of a case. We should reuse (community) standard data formats where possible; including:
Describe alternatives you've considered N/A
Additional context