icgc-argo / workflow-roadmap

Roadmap and management for genomic data processing
GNU Affero General Public License v3.0
1 stars 0 forks source link

🐛 Update `updated_at` in mappings #111

Closed rosibaj closed 3 years ago

rosibaj commented 3 years ago

Describe the bug

Error log reported by @henro001:

"stacktrace": ["org.elasticsearch.transport.RemoteTransportException: [workflow-es-data-4][10.233.81.26:9300][indices:data/read/search[phase/query]]",
"Caused by: org.elasticsearch.index.query.QueryShardException: No mapping found for [updated_at] in order to sort on"

From the indicies:

{"type": "server", "timestamp": "2021-03-15T15:57:16,916Z", "level": "DEBUG", "component": "o.e.a.s.TransportSearchAction", "cluster.name": "workflow", "node.name": "workflow-es-data-5", "message": "[file_centric_1.0][2], node[AghrzU6WQDO-rZTrDgqkWQ], [R], s[STARTED], a[id=XhDFAA7FQ86ctbOyhFazsQ]: Failed to execute [SearchRequest{searchType=QUERY_THEN_FETCH, indices=[analysis_centric, analysis_centric_1.0, file_centric, file_centric_1.0, task, task-20200923, workflow, workflow-20200923], indicesOptions=IndicesOptions[ignore_unavailable=false, allow_no_indices=true, expand_wildcards_open=true, expand_wildcards_closed=false, allow_aliases_to_multiple_indices=true, forbid_closed_indices=true, ignore_aliases=false, ignore_throttled=true], types=[], routing='null', preference='null', requestCache=null, scroll=null, maxConcurrentShardRequests=0, batchedReduceSize=512, preFilterShardSize=42, allowPartialSearchResults=true, localClusterAlias=null, getOrCreateAbsoluteStartMillis=-1, ccsMinimizeRoundtrips=true, source={\"query\":{\"bool\":{\"must\":[{\"term\":{\"analysis_id\":{\"value\":\"e3b4cdf3-a644-47a2-b4cd-f3a64447a2a2\",\"boost\":1.0}}}],\"adjust_pure_negative\":true,\"boost\":1.0}},\"sort\":[{\"updated_at\":{\"order\":\"asc\"}}]}}] lastShard [true]", "cluster.uuid": "6nSVxZP7SiG5swH9g0AFjA", "node.id": "fEOpLy3gQpGYVqMs0l1NKg" ,

Expected behaviour

jaserud commented 3 years ago

Maestro already has updated_at for file_centric and analysis_centric: https://github.com/overture-stack/maestro/blob/7f3388a10bc9e280952cc27fb450c7314b76118c/maestro-app/src/main/resources/analysis_centric.json#L79 https://github.com/overture-stack/maestro/blob/029187d6f54b082d74e0e28516aaaa767e23c233/maestro-app/src/main/resources/file_centric.json#L106

And I have verified these are the mappings in use in dev and qa.

env file_centric analysis_centric
DEV image image
QA image image

I can't see the mapping in use in prod but it must contain updated_at (at least for analysis_centric) because we are able to sort by updadedAt on analysis with no error: image

Also song-search is already set to convert values from es from snake-case: https://github.com/icgc-argo/song-search/blob/24842f7c12accae88777bf093b332b346ccaae7e/src/main/java/bio/overture/songsearch/model/Analysis.java#L34

andricDu commented 3 years ago

follow up on this

andricDu commented 3 years ago

I've confirmed that the mappings for file-centric and analysis-centric in prod have the updated_at date field in their mappings.