Closed turbomam closed 1 month ago
doing this for object properties (slots that relate an instance of one class to another class instance) is the top priority. Among other things, it will allow us to make helpful visualizations.
Doing it for data properties (slots that relate an instance of some class to values, like one string, or a list of integers, etc) might be a lower priority.
this query isn't perfect for this task but it does highlight sever sub-optimal patterns
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX nmdc: <https://w3id.org/nmdc/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX linkml: <https://w3id.org/linkml/>
select ?l ?r
where {
graph nmdc:nmdc {
?s a owl:ObjectProperty .
minus {
?s rdfs:domain ?d
}
optional {
?s rdfs:range ?r
}
minus {
?s rdfs:range linkml:String
}
minus {
?s rdfs:range linkml:Float
}
minus {
?s rdfs:range linkml:Integer
}
minus {
?s rdfs:range linkml:Boolean
}
minus {
?s rdfs:range linkml:Uriorcurie
}
optional {
?s rdfs:label ?l
}
filter(strstarts(str(?s), "https://w3id.org/nmdc/")) # MIXS is the only other namespace 2023-12-08
filter(!strends(str(?s), "_set")) # in progress
}
}
order by ?l
?l | used in | organizastional inc mixin | to remove | data property | not in use |
---|---|---|---|---|---|
input_volume | -> PlannedProcess | ? | |||
has_unit | !!! Biosample !!! | String | |||
ended_at_time | Activity | ||||
execution_resource | Activity | String | |||
started_at_time | Activity | String | |||
version | Activity | String | |||
was_informed_by | Activity | ||||
ammonium_nitrogen | Biosample | ||||
analysis_type | Biosample | ||||
biosample_categories | Biosample | ||||
bulk_elect_conductivity | Biosample | ||||
collection_date_inc | Biosample | String | |||
collection_time | Biosample | String | |||
collection_time_inc | Biosample | String | |||
dna_collect_site | Biosample | String | |||
dna_cont_type | Biosample | JgiContTypeEnum | |||
dna_cont_well | Biosample | String | |||
dna_container_id | Biosample | String | |||
dna_dnase | Biosample | YesNoEnum | |||
dna_isolate_meth | Biosample | String | |||
dna_organisms | Biosample | String | |||
dna_project_contact | Biosample | String | |||
dna_samp_id | Biosample | String | |||
dna_sample_format | Biosample | DnaSampleFormatEnum | |||
dna_sample_name | Biosample | String | |||
dna_seq_project | Biosample | String | |||
dna_seq_project_name | Biosample | String | |||
dna_seq_project_pi | Biosample | String | |||
dna_volume | Biosample | Float | |||
dnase_rna | Biosample | YesNoEnum | |||
emsl_biosample_identifiers | Biosample | ExternalIdentifier | |||
env_package | Biosample | TextValue | |||
experimental_factor_other | Biosample | String | |||
filter_method | Biosample | String | |||
igsn_biosample_identifiers | Biosample | ExternalIdentifier | |||
img_identifiers | Biosample | ExternalIdentifier | |||
insdc_biosample_identifiers | Biosample | ExternalIdentifier | |||
isotope_exposure | Biosample | String | |||
lbc_thirty | Biosample | ||||
lbceq | Biosample | ||||
manganese | Biosample | ||||
micro_biomass_c_meth | Biosample | String | |||
micro_biomass_n_meth | Biosample | String | |||
microbial_biomass_c | Biosample | String | |||
microbial_biomass_n | Biosample | String | |||
neon_biosample_identifiers | Biosample | ExternalIdentifier | |||
nitrate_nitrogen | Biosample | String | |||
nitrite_nitrogen | Biosample | String | |||
non_microb_biomass | Biosample | String | |||
non_microb_biomass_method | Biosample | String | |||
org_nitro_method | Biosample | String | |||
other_treatment | Biosample | String | |||
project_id | Biosample | String | |||
proposal_dna | Biosample | String | |||
proposal_rna | Biosample | String | |||
replicate_number | Biosample | String | |||
sample_shipped | Biosample | String | |||
sample_type | Biosample | SampleTypeEnum | |||
start_date_inc | Biosample | String | |||
start_time_inc | Biosample | String | |||
subsurface_depth | Biosample | ||||
technical_reps | Biosample | String | |||
zinc | Biosample | ||||
rna_collect_site | Biosample | String | |||
rna_cont_type | Biosample | JgiContTypeEnum | |||
rna_cont_well | Biosample | String | |||
rna_container_id | Biosample | String | |||
rna_isolate_meth | Biosample | String | |||
rna_organisms | Biosample | String | |||
rna_project_contact | Biosample | String | |||
rna_samp_id | Biosample | String | |||
rna_sample_format | Biosample | DnaSampleFormatEnum | |||
rna_sample_name | Biosample | String | |||
rna_seq_project | Biosample | String | |||
rna_seq_project_name | Biosample | String | |||
rna_seq_project_pi | Biosample | String | |||
rna_volume | Biosample | Float | |||
dna_concentration | Biosample; ProcessedSample | Float | |||
rna_concentration | Biosample; ProcessedSample | Float | |||
ecosystem | Biosample; Study | ||||
ecosystem_category | Biosample; Study | ||||
ecosystem_subtype | Biosample; Study | ||||
ecosystem_type | Biosample; Study | ||||
specific_ecosystem | Biosample; Study | String | |||
alternative_identifiers | Biosample; Study; NamedThing; MetaboliteQuantification | TRUE | |||
functional_annotation_agg | Database | ||||
data_object_type | DataObject | ||||
file_size_bytes | DataObject | Bytes | |||
extractant | Extraction | ||||
extraction_method | Extraction | String | |||
extraction_target | Extraction | ExtractionTargetEnum | |||
input_mass | Extraction → PlannedProcess | ||||
filter_pore_size | FiltrationProcess | QuantityValue | |||
separation_method | FiltrationProcess | SeparationMethodEnum | |||
subject | FunctionalAnnotation | ||||
has_function | FunctionalAnnotation | String | |||
metagenome_annotation_id | FunctionalAnnotationAggMember | ||||
encodes | GenomeFeature | ||||
end | GenomeFeature | Integer | |||
feature_type | GenomeFeature | String | |||
phase | GenomeFeature | Integer | |||
start | GenomeFeature | Integer | |||
strand | GenomeFeature | ||||
display_order | ImageValue | ||||
library_type | LibraryPreparation | LibraryTypeEnum | |||
members_id | MagBin | String | |||
total_bases | MagBin | String | |||
mags_list | MagsAnalysisActivity | ||||
gold_analysis_project_identifiers | Meta | ExternalIdentifier | |||
metabolite_quantified | MetaboliteQuantification | ||||
has_metabolite_quantifications | MetabolomicsAnalysisActivity | ||||
gold_biosample_identifiers | MetagenomeAnnotationActivity; MetatranscriptomeAnnotationActivity | ExternalIdentifier | |||
insdc_assembly_identifiers | MetagenomeAssembly; MetatranscriptomeAssembly | String | |||
has_peptide_quantifications | MetaproteomicsAnalysisActivity | ||||
duration | MixingProcess → PlannedProcess | ||||
was_generated_by | MULTIPLE UNRELATED CLASSES | ||||
id | NamedThing | String | |||
gold_sequencing_project_identifiers | OmicsProcessing | ExternalIdentifier | |||
insdc_experiment_identifiers | OmicsProcessing | ExternalIdentifier | |||
omics_type | OmicsProcessing | ||||
all_proteins | PeptideQuantification; ProteinQuantification | ||||
best_protein | PeptideQuantification; ProteinQuantification | ||||
instrument_name | PlannedProcess | String | |||
processing_institution | PlannedProcess | ||||
protocol_link | PlannedProcess | ||||
quality_control_report | PlannedProcess | ||||
volume | PlannedProcess | ||||
biomaterial_purity | ProcessedSample | ||||
status | QualityControlReport | StatusEnum | |||
has_maximum_numeric_value | QuantityValue | Float | |||
has_minimum_numeric_value | QuantityValue | Float | |||
direction | Reaction | ||||
left_participants | Reaction | ||||
right_participants | Reaction | ||||
chemical | ReactionParticipant | ||||
compound | SolutionComponent | ||||
concentration | SolutionComponent | ||||
emsl_project_identifiers | Study | ||||
gnps_task_identifiers | Study | ExternalIdentifier | |||
gold_study_identifiers | Study | ExternalIdentifier | |||
jgi_portal_study_identifiers | Study | ExternalIdentifier | |||
mgnify_project_identifiers | Study | ExternalIdentifier | |||
neon_study_identifiers | Study | ExternalIdentifier | |||
notes | Study | String | |||
related_identifiers | Study | String | |||
study_category | Study | StudyCategoryEnum | |||
insdc_bioproject_identifiers | Study; OmicsProcessing | ExternalIdentifier | |||
principal_investigator | Study; OmicsProcessing | ||||
websites | Study; PersonValue | String | |||
contained_in | SubSamplingProcess | ||||
mass | SubSamplingProcess | ||||
temperature | SubSamplingProcess → PlannedProcess | ||||
container_size | SubSamplingProcess; FiltrationProcess | ||||
language | TextValue | LanguageCode | |||
analysis_identifiers | TRUE | ||||
assembly_identifiers | TRUE | ||||
attribute | TRUE | TRUE | |||
biosample_identifiers | TRUE | ||||
emsl_identifiers | TRUE | ||||
gff_coordinate | TRUE | ||||
gnps_identifiers | TRUE | ||||
gold_identifiers | TRUE | ||||
has_participants | TRUE | ||||
igsn_identifiers | TRUE | ||||
insdc_identifiers | TRUE | ||||
jgi_portal_identifiers | TRUE | ||||
metagenome_assembly_parameter | TRUE | ||||
mgnify_identifiers | TRUE | ||||
neon_identifiers | TRUE | ||||
omics_processing_identifiers | TRUE | ||||
read_qc_analysis_statistic | TRUE | ||||
study_identifiers | TRUE | ||||
external_database_identifiers | TRUE | ExternalIdentifier | |||
date_created | ? | ||||
etl_software_version | ? | ||||
insdc_analysis_identifiers | ExternalIdentifier | ? | |||
insdc_secondary_sample_identifiers | ExternalIdentifier | ? | |||
insdc_sra_ena_study_identifiers | ExternalIdentifier | ? | |||
mgnify_analysis_identifiers | ExternalIdentifier | ? | |||
model | InstrumentModelEnum | ? | |||
sample_collection_month | String | ? | |||
value | ? | ||||
vendor | InstrumentVendorEnum | ? | |||
emsl_store_temp | String | ? |
I was going rogue/not trusting inferences and asking to assert domain
s for the sake of diagram drawing.
So let's actually remove the domain assertions.
When we say that some
Biosample
hastotal_strontium
{'has_numerical_value': 15, 'has_unit': 'ppm'}
, we are saying that the domain includes, at a minimum, theBiosample
class. Possibly it includes additional classes, which hopefully would come from the same branch of the class hierarchy. MaybeProcessedSample
s could also have atotal_strontium
value. If that were the case, than reasonable domain fortotal_strontium
would beMaterialSample
, given the current classes in the schema.