helxplatform / dug

Semantic Search
MIT License
32 stars 10 forks source link

Sorted lists for json serialization for parser and annotator outputs DUG-408 #336

Closed mbacon-renci closed 10 months ago

mbacon-renci commented 10 months ago

As we try to use LakeFS to avoid re-processing on unnecessary changes, sometimes the output is detected as a change when no real data have changed. This is usually an ordering problem in an unordered list or key:value pair. The below URL has a LakeFS commit that should not have occurred.

https://lakefs.apps.renci.org/repositories/yk-heal/commits/e0739329347be52164815581c3d819cb4712ed5e71d66214e0888b3950c03908?prefix=annotate_and_index%2Fcrdc_dataset_pipeline_task_group.crawl_crdc%2Fphs000178.v11.pht000804.v9.TCGA_Subject.data_dict%2F

A code modification to the JSON output will be needed to fix this, after some investigation.

mbacon-renci commented 10 months ago

Ah, that’s my mistake. I somehow understood from you that we needed to modify jsonable, which I couldn’t figure out how it got used but figured there was some code somewhere that I was missing.

Yes, we can just keep the terms sorted. I should be able to get a PR together on that by early this afternoon hopefully.

From: YaphetKG @.> Date: Thursday, January 11, 2024 at 4:25 PM To: helxplatform/dug @.> Cc: Bacon, Michael T @.>, Author @.> Subject: Re: [helxplatform/dug] Sorted lists for json serialization for parser and annotator outputs DUG-408 (PR #336)

@YaphetKG requested changes on this pull request.

Hi Michael , I don't think sorting them in the jsonable methods helps us. Jsonpickle doesn't rely on that method when generating the json files in roger. But I was thinking in methods like (https://github.com/helxplatform/dug/blob/develop/src/dug/core/parsers/_base.py#L58) if we sort the lists here, right after modification , maybe that could work.

— Reply to this email directly, view it on GitHubhttps://github.com/helxplatform/dug/pull/336#pullrequestreview-1816741042, or unsubscribehttps://github.com/notifications/unsubscribe-auth/A2LNIAPHOJSKISWNH4AKIN3YOBKGJAVCNFSM6AAAAABBVUQ3F2VHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMYTQMJWG42DCMBUGI. You are receiving this because you authored the thread.Message ID: @.***>

mbacon-renci commented 10 months ago

@YaphetKG I have re-done the patch to sort on assignment or change rather than on output. I am also happy to address @vladimir2217's concerns about exposing self.__dict__ if we want to add that to the change as well.

mbacon-renci commented 10 months ago

Also, this is apparently failing Trivy results but I can't find the result output to figure out what vulnerability needs to be addressed.

YaphetKG commented 10 months ago

Also, this is apparently failing Trivy results but I can't find the result output to figure out what vulnerability needs to be addressed.

The Scan results can be found at

image

going to Checks in the top of this PR then on the next page here

image

Going to the Code Scanning Results > Trivy scans Then View all branch results

YaphetKG commented 10 months ago

Did you run this in roger ? would be nice if it is tested and ready to go. It looks good

mbacon-renci commented 10 months ago

Did you run this in roger ? would be nice if it is tested and ready to go. It looks good

I'm getting some DAG import errors which I think are unrelated to the code changes I made and have to do with an error in linkml-runtime 1.6.0. I'm trying to bump some versions in part to fix that and in part to get past some of the pile of vulnerabilities that are showing up on scan.

joshua-seals commented 10 months ago

I have investigated why trivy is flagging vulnerabilities, assuming it to be the caching we were doing for image builds to no avail.

I believe Trivy is somehow confused as to the "closed" vulnerability from the python docker image, which does seem to be the culprit. This could be anomalous (one time error) or could be a bug in trivy. The most simplistic way with least engineering overhead, to test the theory is letting this pr merge "bypassing the checks" for now to see if this behavior recurs on the next pr. Given that previous fixes did not experience this problem and the trivy action itself has seen no significant changes since it's last release I am somewhat confident this will not persist past the present occurrence.

YaphetKG commented 10 months ago

Image version changed to a pached alpine on PR #337 . Code tested and works. Pulled in other branch. Closing this as Complete.