paleobot / pbot-dev

Codebase and initial design documents for pbot client
MIT License
2 stars 2 forks source link

Validation to prevent duplicates #100

Closed ecurrano closed 1 month ago

ecurrano commented 1 year ago

Can we do some sort of validation that a particular node (reference, collection, person, others?) does not already exist.

Validation may eventually come from the PBDB side, once the integration is built.

For the HackAThon, we will want to train people to prevent duplicates.

aazaff commented 1 year ago

Let's discuss/investigate this before the August meeting. Agree we should not actually change anything before then, but just a little bit a time to map it out would be wise.

clairecleveland commented 1 year ago

HackAThon demonstrated that this should be a high priority. Doug and Andrew already working on this and need decision on exactly where duplication checks should be emplaced.

NoisyFlowers commented 1 year ago

First pass of this is in paleobot/pbot-api@0e4689d057289cee8283f9a79fd20533e07d21dd paleobot/pbot-api@397a39e9723fd99c2fe1baa60f0a8f26a82aeaa7 paleobot/pbot-api@17a3fef9aee73638b65adf346a44ed053f393e50

A duplication error is thrown by the following constraint violations:

Reference
Schema
    existing node with same node type AND same title 

OTU
Specimen
Description
Collection
Group
    existing node with same node type AND same name

Person
    existing Person node with same given AND middle AND surname

Image
    existing Image node with IMAGE_OF relationship to same node AND same link

Character
    existing Character node with same name and CHARACTER_OF relationship to same parent

State
    existing State node with same name and STATE_OF relationship to same parent

Synonym
    existing Synonym node with SAME_AS relationships to same OTUs

Note that there is no dup check for CharacterInstances. This is because CharacterInstance is the only node type whose create and update functionality is handled in cypher rather than javascript. Adding the dup check there is more complicated, and I'm not sure it's worth the effort. Opinions?

If it were implemented, the constraint would be:

CharacterInstance
    existing node with DEFINED_BY relationship to same Description node AND INSTANCE_OF relationship to same Character node AND HAS_STATE relationship to same State node
aazaff commented 1 year ago

I cannot think of an example at this moment, but I think we might want to allow dup character instances anyway… I feel like that is correct, but again no examples yet. Anyway, good work.

From: NoisyFlowers @.> Date: Thursday, August 17, 2023 at 12:22 PM To: paleobot/pbot-dev @.> Cc: Zaffos, Andrew - (azaffos) @.>, Assign @.> Subject: [EXT]Re: [paleobot/pbot-dev] Validation to prevent duplicates (Issue #100)

External Email

First pass of this is in @.https://github.com/paleobot/pbot-api/commit/0e4689d057289cee8283f9a79fd20533e07d21dd @.https://github.com/paleobot/pbot-api/commit/397a39e9723fd99c2fe1baa60f0a8f26a82aeaa7

A duplication error is thrown by the following constraint violations:

Reference

Schema

    existing node with same node type AND same title

OTU

Specimen

Description

Collection

Group

    existing node with same node type AND same name

Person

    existing Person node with same given AND middle AND surname

Image

    existing Image node with IMAGE_OF relationship to same node AND same link

Character

    existing Character node with same name and CHARACTER_OF relationship to same parent

State

    existing State node with same name and STATE_OF relationship to same parent

Synonym

    existing Synonym node with SAME_AS relationships to same OTUs

Note that there is no dup check for CharacterInstances. This is because CharacterInstance is the only node type whose create and update functionality is handled in cypher rather than javascript. Adding the dup check there is more complicated, and I'm not sure it's worth the effort. Opinions?

If it were implemented, the constraint would be:

CharacterInstance

    existing node with DEFINED_BY relationship to same Description node AND INSTANCE_OF relationship to same Character node AND HAS_STATE relationship to same State node

— Reply to this email directly, view it on GitHubhttps://github.com/paleobot/pbot-dev/issues/100#issuecomment-1682835362, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ACPQSQYKLGEMM6A62Q5CYBLXVZVPPANCNFSM6AAAAAAZX34P7M. You are receiving this because you were assigned.Message ID: @.***>

clairecleveland commented 1 year ago

The fix proposed should work. Even when a single fruit or seed may have several different layers that are unique but each one is described by the same character names, the nodes are different. Layer 1, layer 2 etc would be different header’s (node relationships are different downstream) with the same options of character names and state names for each. This should all still be covered by what Doug proposes.

I feel like duplicate character instances should be a rare mistake because of the way they are entered. The duplication would be right there for the user to see as they enter it. However, we’ve already had complaints about people being so many keystrokes ahead of entering character instances that duplicates could be an issue if people are working ahead of the screen refresh. Even though we can say this was a WiFi issue at HackAThon, it’s going to be an issue even at good WiFi speed once people are familiar and cranking through their data entry. Descriptions are a slow (“cumbersome“) process right now.

If duplicates we’re allowed to exist, we’ll have to figure out how to handle that in reporting, Ex synonymies and other comparisons. So is the fix easier for duplicates or reporting?

I think fixing the speed of character instance entry is the highest priority which in turn ensures duplication errors are rare. Not sure if fixing duplication errors is harder than accommodating for duplicates in reports, particularly synonymies.

On Thu, Aug 17, 2023 at 3:24 PM Andrew Zaffos @.***> wrote:

I cannot think of an example at this moment, but I think we might want to allow dup character instances anyway… I feel like that is correct, but again no examples yet. Anyway, good work.

From: NoisyFlowers @.> Date: Thursday, August 17, 2023 at 12:22 PM To: paleobot/pbot-dev @.> Cc: Zaffos, Andrew - (azaffos) @.>, Assign @.> Subject: [EXT]Re: [paleobot/pbot-dev] Validation to prevent duplicates (Issue #100)

External Email

First pass of this is in @.***< https://github.com/paleobot/pbot-api/commit/0e4689d057289cee8283f9a79fd20533e07d21dd>

@.***< https://github.com/paleobot/pbot-api/commit/397a39e9723fd99c2fe1baa60f0a8f26a82aeaa7>

A duplication error is thrown by the following constraint violations:

Reference

Schema

existing node with same node type AND same title

OTU

Specimen

Description

Collection

Group

existing node with same node type AND same name

Person

existing Person node with same given AND middle AND surname

Image

existing Image node with IMAGE_OF relationship to same node AND same link

Character

existing Character node with same name and CHARACTER_OF relationship to same parent

State

existing State node with same name and STATE_OF relationship to same parent

Synonym

existing Synonym node with SAME_AS relationships to same OTUs

Note that there is no dup check for CharacterInstances. This is because CharacterInstance is the only node type whose create and update functionality is handled in cypher rather than javascript. Adding the dup check there is more complicated, and I'm not sure it's worth the effort. Opinions?

If it were implemented, the constraint would be:

CharacterInstance

existing node with DEFINED_BY relationship to same Description node AND INSTANCE_OF relationship to same Character node AND HAS_STATE relationship to same State node

— Reply to this email directly, view it on GitHub< https://github.com/paleobot/pbot-dev/issues/100#issuecomment-1682835362>, or unsubscribe< https://github.com/notifications/unsubscribe-auth/ACPQSQYKLGEMM6A62Q5CYBLXVZVPPANCNFSM6AAAAAAZX34P7M>.

You are receiving this because you were assigned.Message ID: @.***>

— Reply to this email directly, view it on GitHub https://github.com/paleobot/pbot-dev/issues/100#issuecomment-1682837915, or unsubscribe https://github.com/notifications/unsubscribe-auth/ARLULED3MBVJIKCIZVGWHJLXVZVXXANCNFSM6AAAAAAZX34P7M . You are receiving this because you commented.Message ID: @.***>

doricon commented 1 year ago

Put some checks also in the specimen form for specimen number aka name

Great work!