ncihtan / data-models

Schema.org Data Models for HTAN
MIT License
14 stars 7 forks source link

Remove spurious line feeds from Description fields #39

Closed vthorsson closed 2 years ago

vthorsson commented 2 years ago

Information

Proposed change:

The Description field for "AJCC Clinical M" contains a line feed , separating "Extent of the distant metastasis for the cancer based on evidence obtained from clinical" and "assessment parameters determined prior to treatment." This line feed does not seem to be in the originating reference CDE description

https://cdebrowser.nci.nih.gov/cdebrowserClient/cdeBrowser.html#/search?publicId=3440331&version=1.0

The linefeed can also cause problems for downstream processing.

This problem is there for current HTAN.model.csv lines 621 ("AJCC Clinical Stage) through and including 629 ("Anaplasia Present"), with around 14 total line feeds.

This region of the csv seems to be only one with this problem.

Manual conversion of these 14 to spaces is recommended.

I have generated a candidate replacement file for HTAN.model.csv by making a copy of HTAN schema v22.02, making manual corrections on that gsheet, and exporting to csv.

--

How important is this feature?


When will use cases depending on this become relevant?


Implementation checklist

vthorsson commented 2 years ago

@adamjtaylor can you advise on how to also get this change into HTAN schema v22.02? Would it make sense to implement it there first, then do the .csv export and proceed with the steps above? Or is the more usual procedure to import a (PR approved) csv to the gsheet?

adamjtaylor commented 2 years ago

@vthorsson This "new" process is to open a PR editing the csv in a new branch. The google sheet will be retained for commenting and visualisation purposes only. The Google sheet now has a "latest" tab which automatically reflects the current data model in main. @elv-sb and I are still working out the exact process on release versioning/naming but we will clarify as we work it out!

adamjtaylor commented 2 years ago

Helpfully this fix also makes the csv preview render properly in Github image

vthorsson commented 2 years ago

Thanks @adamjtaylor for the background material above. Good to know that removing pesky line feeds will take care of various other things