Closed ariutta closed 2 years ago
I identified 32 busted tsvs and removed them so the site is building again. You can find them here: https://github.com/wikipathways/wikipathways.github.io/tree/main/bad%20md%20and%20tsv%20files
Tina, using a TSV/CSV library that handles escaping special characters might be the most reliable solution.
This quoting=csv.QUOTE_NONE
change fixed the parsing for my script, but it doesn't fix the parsing for the jekyll site.
Let's remove quoting=csv.QUOTE_NONE
when you update the serialization of the datanode TSVs.
I temporarily disabled the copying of datanode TSV files over to the jekyll site: https://github.com/wikipathways/wikipathways-database/blob/7ae256201c2e6972d271e77145244de0e26fcc70/.github/workflows/on_gpml_change.yml#L226-L228
Once the TSV file serialization is working, we can re-enable that line.
@ariutta can you check if the new data nodes comment format solves the issue? I still need to fix the ordering but wanted to fix the comments problem first.
@ariutta @mkutmon What's the status of this issue? Looks like we are not copying over datanode.tsv files to the jekyll site until this is resolved.
I guess we'd need to give it a try -- just copy some over and see whether Jekyll correctly parses them. I haven't checked lately, but Tina's change may have worked.
@AlexanderPico, @mkutmon, I re-enabled datanodes.tsv: https://github.com/wikipathways/wikipathways-database/blob/834874d96bc2b4d8054da070cbe0e1d7480d8266/.github/workflows/on_gpml_change.yml#L350
We'll have to keep on eye one whether Jekyll can parse the files with the updated formatting.
I tested the following changes by editing the GPML files in wp-db and seeing the results on the new site:
So, looks like @mkutmon's checklist is to active. I'll comment those lines out again.
Replacing all double quotes with single quotes resolves the TSV parsing issues for all the "bad" datanode.tsv cases, e.g., WP1763:
Ctr9 GeneProduct 22083 Entrez Gene full name: Ctr9 (Alco called "SH2 domain-binding protein 1")
Made a new release of meta-data-action jar attempting to fix double quotes on comments: https://github.com/wikipathways/meta-data-action/releases/tag/v0.0.2
It worked! Now, I'm uncommenting the cp datanodes.tsv line as it should work for all the files previously marked "bad."
These aren't always being serialized/parsed consistently. Examples:
WP5064-datanodes.tsv
WP5114-datanodes.tsv
WP5166-datanodes.tsv
~~This may be due to the change I made here for the parameters for parsing quoted fields: https://github.com/wikipathways/wikipathways-database/commit/288ce733e1607777c6ede0b290c4639382da3e29#diff-f3a0c6cf05b703b1241cd9c83a3c9efad10e6e84ce8f1246f0ffd25e1f2b8fcbL82~~ Update: the jekyll site is parsing the TSV files independently of my script.
The annotations CSV files do use quotes to indicate fields, so I changed all the parsing to support quoted fields. @mkutmon, do you want to change the serialization of WPx-datanodes.tsv files to support quoted fields, or should we usequoting=csv.QUOTE_NONE
here when parsing WPx-datanodes.tsv files?