Closed sidharthramesh closed 3 years ago
If you use the "list" command at the command line it will tell you what files it finds. It may be being too strict with file name conventions.
Also, after import try using the status command to list the installed reference sets to double check what is installed, or not.
Also, is the file in question listed as a file imported during import?
On listing the refsets in SnomedCT_IndiaReferenceSetsRF2_PRODUCTION_202108067T120000Z, only 6 out of 32 files in the directory were listed. Directory and output of list pasted below.
Directory: der2_cRefset_AssociationReferenceSnapshot_IN1000189_20210806.txt der2_cRefset_AttributeValueSnapshot_IN1000189_20210806.txt der2_Refset_cardiologySnapshot_IN1000189_20210806.txt der2_Refset_cardiothoracicAndVascularSurgerySnapshot_IN1000189_20210806.txt der2_Refset_cataractSnapshot_IN1000189_20210806.txt der2_Refset_cervicalCancerSnapshot_IN1000189_20210806.txt der2_Refset_childhoodDiarrheaSnapshot_IN1000189_20210806.txt der2_Refset_dengueSnapshot_IN1000189_20210806.txt der2_Refset_dermatologySnapshot_IN1000189_20210806.txt der2_Refset_emergencySnapshot_IN1000189_20210806.txt der2_Refset_fetalMedicineSnapshot_IN1000189_20210806.txt der2_Refset_gastroenterologySnapshot_IN1000189_20210806.txt der2_Refset_generalSurgerySnapshot_IN1000189_20210806.txt der2_Refset_geriatricsSnapshot_IN1000189_20210806.txt der2_Refset_iodineDeficiencySnapshot_IN1000189_20210806.txt der2_Refset_leprosySnapshot_IN1000189_20210806.txt der2_Refset_lymphaticFilariasisSnapshot_IN1000189_20210806.txt der2_Refset_malariaSnapshot_IN1000189_20210806.txt der2_Refset_nephrologySnapshot_IN1000189_20210806.txt der2_Refset_neurologySnapshot_IN1000189_20210806.txt der2_Refset_neurosurgerySnapshot_IN1000189_20210806.txt der2_Refset_obstetricsAndGynecologySnapshot_IN1000189_20210806.txt der2_Refset_oncologySnapshot_IN1000189_20210806.txt der2_Refset_oralCancerSnapshot_IN1000189_20210806.txt der2_Refset_orthopedicsSnapshot_IN1000189_20210806.txt der2_Refset_pediatricsSnapshot_IN1000189_20210806.txt der2_Refset_pregnancyRelatedAnemiaSnapshot_IN1000189_20210806.txt der2_Refset_psychiatrySnapshot_IN1000189_20210806.txt der2_Refset_radiologySnapshot_IN1000189_20210806.txt der2_Refset_rheumatologySnapshot_IN1000189_20210806.txt der2_Refset_strokeSnapshot_IN1000189_20210806.txt der2_Refset_tuberculosisSnapshot_IN1000189_20210806.txt
OUTPUT:
| :filename | :component | :version-date | :format | :content-subtype | :content-type | |------------------------------------------------------------------+----------------------+---------------+---------+------------------------------+---------------| | der2_cRefset_AssociationReferenceSnapshot_IN1000189_20210806.txt | AssociationRefset | 2021-08-06 | 2 | AssociationReferenceSnapshot | cRefset | | der2_cRefset_AttributeValueSnapshot_IN1000189_20210806.txt | AttributeValueRefset | 2021-08-06 | 2 | AttributeValueSnapshot | cRefset | | der2_cRefset_LanguageSnapshot-en_IN1000189_20210806.txt | LanguageRefset | 2021-08-06 | 2 | LanguageSnapshot-en | cRefset | | sct2_Concept_Snapshot_IN1000189_20210806.txt | Concept | 2021-08-06 | 2 | Snapshot | Concept | | sct2_Description_Snapshot-en_IN1000189_20210806.txt | Description | 2021-08-06 | 2 | Snapshot-en | Description | | sct2_Relationship_Snapshot_IN1000189_20201127.txt | Relationship | 2020-11-27 | 2 | Snapshot | Relationship |
See https://confluence.ihtsdotools.org/plugins/servlet/mobile?contentId=56330817#content/view/56330817
Am I being too strict in my interpretation of the file name conventions here? What type of refsets are these? Are they simple?
They are simple refsets. They are released by the government body in India. Not sure if they are using a wrong naming convention. Will have to give the document you mentioned a read.
Looks as if I'm using the "Summary" field of the "ContentSubType" element of the filename to determine reference set type. This works with the UK edition, but it isn't working with other distributions. We can determine the type of reference set in other ways - derRefset is always a simple refset - der[pattern]Refset will include additional columns, which we could examine to determine the reference set type, if not included in Summary field.
I don't think they are using the wrong naming convention - it's just different to the UK, and a bit more complicated to determine the type of reference set. Looks like a bug on my part.
@wardle Great! Yes. I just finished reading the docs you gave me, and the naming convention seems okay. Regarding how to detect if it’s a simple refset, I think your approach seems sensible.
The fix for this will also fix #30 which is nice.
Yes!! Great! Any way we could help?
I'm pretty much done but I'd like to add more testing.
But as part of doing that, I have come across an issue with the Spanish refsets which don't use patterns or names in file names. It's not completely clear to me yet but I may need to try to deduce file types by looking a column headings. Patterns are meant to tell you how to serialise the user defined data in the item - c or i or s - but without that, all one can do is treat as strings and leave all work to the client.
Currently the Indian refsets you shared are picked up nicely and there's no issue because they're all simple refsets.
It would be helpful if you can test with the full distribution. I can push work so far to a different branch for you to test if you're willing, or I will make a synthetic distribution with these issues in it and test with that and release when I'm happy.
Thank you. That sounds doable. Please push your changes and let me know. I’ll test it with all the distributions I have and see if it’s picking up all files and report if there are any missing. I think I also have access to the Spanish distribution. Will check that too if I have time.
Thanks @sidharthramesh - the big issue is with the Spanish distribution. Here is just one example:
Filename
der2_cRefset_VMPPCNSpainDrugMapSnapshot_es-ES_es_20211001.txt
So what is that? Well it is a refset. It should have the basic structure and then an extra column 'c' - ie a concept identifier. So let's take a look:
id effectiveTime active moduleId refsetId referencedComponentId mapGroup mapPriority mapRule mapAdvice mapTarget correlationId mapCategoryId referencedComponentTerm orderGroup
7e539731-3c21-41fb-ad09-66606c164a60 20180501 1 90000011000140108 90000091000140102 720507 1 1 TRUE ALWAYS RELATED MAPTARGET 54851000140103 22681000122102 22651000122108 TADALAFILO TECNIGEN 20 MG COMPRIMIDOS RECUBIERTOS CON PELICULA EFG , 4 COMPRIMIDOS 1
So it's a cross map, with an 'orderGroup' on the end - so not a concept identifier but an integer. Which is fine from a serialisation point of view, because we'd store both as a long anyway, but not ideal. I have other examples in which they've put a string in there and not used a correct pattern.
So at the moment, hermes complains, because it can't work out that this is a cross-map.
The options are:
I’ve always felt that failing outright is always better than making assumptions on unstable grounds. Sometimes this might even behave differently than what the user expects. I would always go with simple and transparent over “automagical” behaviour always.
With the Spanish reference set - I think it’s not correct naming convention wise. I think the error should just say something like “refset format and naming convention mismatch” and ask the end user (or Spanish release center ideally) rename it correctly.
Until then instead out outright failing, you could raise a warning and skip the file during importing and indexing.
Hi @sidharthramesh - it should work now for the Indian reference sets.
fc77e8aea2c1024dac4907fbc3c731681615fda4 also fails fast if there is an issue.
Let me know how you get on.
Dear @sidharthramesh and @DharsanB : You can now use the v0.8.3 release and see whether it resolves the issue. Thanks for testing. Let me know if any issues.
clj -M:run list ~/Downloads/SnomedCT_IndiaReferenceSetsRF2_PRODUCTION_202108067T120000Z
identifies a more complete list of importable files now:
Hey, @wardle Thank you for the incredibly quick fix. We've tested it with our release files, and so far everything is getting indexed. Thanks!
Hey @wardle, I've been having issues querying members of Refsets using the constrain parameter of the API.
Version: v0.8.1
Querying one of the refsets to get its members using:
http://localhost:8080/v1/snomed/search?constraint=^1131000189100
Gives a 404 Not Found error.
The release files I've used to index and search can be found here.
The query
http://localhost:8080/v1/snomed/search?constraint=^1101000189108
- Members of 1101000189108 |CTV3 simple map reference set (foundation metadata concept)| refset seems to work just fine.Investigating further, looking at one of the Refset files
./SnomedCT_IndiaReferenceSetsRF2_PRODUCTION_202108067T120000Z/Snapshot/Refset/Content/der2_Refset_cardiologySnapshot_IN1000189_20210806.txt
:The concept 1001000119102 |Pulmonary embolism with pulmonary infarction (disorder)| has only the following refsets:
and does not include
1131000189100
which is part of the file used to index. However, the concept1131000189100
does exist in the server.I believe the issues can be replicated by just using the packages:
./SnomedCT_IndiaReferenceSetsRF2_PRODUCTION_202108067T120000Z
and./SnomedCT_InternationalRF2_PRODUCTION_20210131T120000Z
(link here). It might have something to do with the naming conventions and directory structure of the files?