Evaluate / document idx field

jacobwegner commented 4 years ago

The goal is that idx values are a 0-based index across text part branches, but I think we've gotten a bit sloppy with that application.

@eelkevdbos raised a couple of points that I think are important:

1) Django querysets don't have an explicit order_by clause. SQLite (used by ATLAS) defaults to rowid for ordering (https://www.sqlite.org/queryplanner.html#_sorting_by_rowid), so effectively we have had an "implicit" order based on pk. 2) Ingestion has typically been done so that pk and idx values have a similar order.

2) If there is a use case where data is re-ordered after ingestion, we should ensure that:

idx is the best place to store that order
queries are ordered by idx within ATLAS's GraphQL endpoint

eelkevdbos commented 4 years ago

A couple of additional thoughts:

In our case, order_by('idx', 'pk') does not hurt us because ingestion left the idx empty. If it would not be empty but would effectively follow pk, this would not matter as well. Should we then consider sorting idx, pk a default?
Performance wise, should we consider adding an index to the idx column (even if we do sorting inside a resolver)?
As for integration with the CTS ecosystem. We currently use a <scaife:idx /> element inside of a work cts file to setup custom sorting. Is this the correct place to discuss validity of such an approach? This approach would obviously not work for a textpart and might be something to consider unifying.

jacobwegner commented 4 years ago

@eelkevdbos following up on our call, I'll continue to dig in to how idx is currently implemented; I definitely see what you're talking about where we have 0 for multiple nodes beneath a work.

Before we get to changing order by or index, I think the more appropriate approach may be to make sure that the sorting defined in CTS collections is properly reflected in the default Node ordering in ATLAS.

The CTSCollectionResolver iterates through text groups, works and versions using the ordering provided by each object class in cts.collections: https://github.com/scaife-viewer/backend/blob/22f2fe26e22bfa9e09dfc78b0f253b29bf4c3bef/atlas/scaife_viewer/atlas/resolvers/cts_collection.py#L12

In SE, there has been customizations to enforce ordering (using SORT_OVERRIDES and TEXTKIND_ORDER constants) that I think could be adapted to the hookset being added in #23.

So ATLAS would depend on the ordering in CTS resolvers (rather than having to implement a separate sort).

The internal element could be <scaife:idx /> or <scaife:sort-position /> as-desired.

For scaife.perseus.org, I think we'd continue to use the sorts specified in the hookset defaults.

When you're able to share whatever WIP you have within SE, I can help to adapt SE to utilize the patterns in #23; I don't think SE is yet at the point where we can simply swap in the scaife-viewer-core package, but I want to keep working in that direction.

jacobwegner commented 4 years ago

The scaife-viewer-core@0.1-a3 release adds the hookset functionality.

I think for now we can use it to continue to "piggyback" in the row id ordering in SQLite.

scaife-viewer / backend

Evaluate / document idx field #22