Open jacobwegner opened 1 year ago
Thanks for this @jacobwegner!
I just wanted to document a few issues that I've run into while getting set up -- I'm patching some of these as I go, but we might want to do a sweep once the ROI feature is ready to go.
TEXT_ANNOTATION_KIND_SCHOLIA
and TEXT_ANNOTATION_KIND_SYNTAX_TREE
appear to have been moved from models.py
to constants.py
. I changed the import statement in importers/text_annotations.py
to reflect the new location.apply_token_annotations()
is being called with reset=True
in ingestion_pipeline.py
. Removing the keyword argument (which isn't being used by apply_token_annotations()
anymore) appears to fix the issue.importers/named_entities.py#_apply_entities
appears to be failing. I've added a try-catch with some debugging output. UPDATE 2: It looks like the data/ directory that's included with the repo is outdated; deleting that and re-running python manage.py stage_atlas_data
creates data that looks like what the script expects.GrammaticalEntry
, GrammaticalEntryCollection
, and TextAnnotationCollection
models that are imported in scaife_stack_atlas/temp.py
-- I've commented those imports out, along with the functions that use them (and the calls to those functions in the settings.py pipeline)TextAnnotation
seems no longer to belong to a collection
as expected in temp.py"ve_ref"
column in the CSV that's read by extract_ref_and_token_position()
-- the first row looks like:
{
"uuid":"t1.1_0",
"value":"μῆνιν",
"word_value":"μῆνιν",
"lemma":"μῆνις",
"gloss":"",
"part_of_speech":"noun",
"tag":"n-s---fa-",
"case":"accusative",
"mood":""
}
Judging by the regex at the top of the file, it looks like "ve_ref"
maybe used to point to what's now the uuid column -- but the format has changed from what the regex expects, so the assert match
statement fails.
I updated the regex and changed the lookup, and that mostly seems to be working. The problem now occurs when we get to update_version_tokens()
: not all of the tokens exist in the CSV, so the lookup fails there.
As an aside, I'm also working on getting this working outside of Docker, as I worry that that extra layer is adding complexity for this initial setup phase.
I'll continue to update here as I encounter/fix the above issues. Some of the problems here might be user error on my part -- please let me know if that seems like the case!
@pletcher: I'm very sorry about all this...I think I've caused some additional confusion by not having better documentation (in the existing README.md
file or in this issue we're using to track updates!)
I'd be happy to sync up on a call or Slack to walk through some of these issues. temp.py
is a gigantic hack that I want to revisit so we have proper "abstract" implementation of pipelines in scaife-viewer-atlas
and project-level overrides in places like beyond-translation-site
.
As far as all of the errors you're seeing, it's likely because you've checked out scaife-viewer-atlas
as of main, but beyond-translation-site
is using the branch atlas/grammar). I really want to get proper release processes for scaife-viewer-atlas
so we can avoid issues like this.
To resolve your issues, I'd suggest stashing / copying the actual changes you needed to make to the GraphQL schema schema.py
, and then creating a branch off of atlas/grammar
:
# assuming git clone git@github.com:scaife-viewer/backend.git
git stash
git checkout atlas/grammar
git checkout -b atlas/<your-feature-name>
git stash pop
# continue on to changes to `scaife_viewer/atlas/schema.py`
# commit changes to atlas/<your-feature-name>
# push changes to GtiHub
After you've pushed any changes to GitHub for ATLAS, as mentioned above, we can update the pinned requirement in requirements.txt
:
scaife-viewer-atlas @https://github.com/<your-org>/<your-fork-repo>/archive/<your-commit-sha>.zip#subdirectory=atlas
Again, I'm very sorry for the inconvenience, and let me know if a call or screenshare would help.
@jacobwegner Please don't apologize! I just wanted to keep track of what I was running into.
As far as all of the errors you're seeing, it's likely because you've checked out scaife-viewer-atlas as of main, but beyond-translation-site is using the branch atlas/grammar). I really want to get proper release processes for scaife-viewer-atlas so we can avoid issues like this.
Ah, okay -- I was starting to wonder if that was the case. I'll checkout the correct branch and see if I can get set up that way. I don't think I'll need to do a stash pop yet since the changes that I've made so far have been focused on getting the data loaded.
Thanks for this again -- I'm happy to keep updating here, if that's okay?
@pletcher: Updates here are fine...especially because I'm aware of where things aren't always tidy / clear for other developers to jump in, I'm happy to be available synchronously too so you're not paying a productivity tax due to my lack of clarity :-)
Thanks! I think that'll be good to do soon as well, maybe late this week or early next? I'll send you an email :)
Another update (2023-01-10): Switching to the atlas/grammar
branch in the backend repo fixed almost everything. 🎉 It seems like the data are now just committed to the repo -- deleting data/
and running python manage.py stage_atlas_data
led to errors in the ingestion process, but it went mostly fine (see below) with the committed data.
I had to bump a couple of dependencies to the latest minor versions (pandas to 1.5.2 and lxml to 4.9.2) because of being on an M1 Mac.
And in temp.py#process_file()
, I added a check/early return in case data
is a list
: I think temp.py#get_paths()
is pulling in the "manifest" file at data/annotations/text-alignments/alignments-tlg0012.tlg001.json
, which errors out at the data["versions"]
lookup.
I had to bump a couple of dependencies to the latest minor versions (pandas to 1.5.2 and lxml to 4.9.2) because of being on an M1 Mac.
Totally fine; feel free to commit those as part of the PR
Re: stage_atlas_data
....another README.md
change I can make. This repository is an iteration of a few early repositories that gradually evolved to https://github.com/scaife-viewer/scaife-stack.
We decided not to ship data in the scaife-stack
repository, but I've found it was easier to make the entire repository buildable with just git checkout
.
It looks like we updated this in the top-level README.md
, but not in backend/README.md
.
@pletcher raised an issue, in that we don't have documentation yet on working with a local copy of the ATLAS package within the
docker-compose
environment.I've been doing most of my development on my local machine using a
virtualenv
and then usingdocker-compose
to browse locally or for deployments.I wanted to capture a few notes here and will turn that into a documentation update.
Working with a local checkout of the
scaife-viewer-atlas
package withindocker-compose
1) Clone https://github.com/scaife-viewer/backend into the
backend/
directory:Since this directory is mounted into the container, its contents can be edited locally or from within the container.
2) Install the local copy within a running Docker container
or
3) Edit
requirements.txt
to install the local copy the next time the container is builtAfter changes have been verified within
scaife-stack-atlas
, we should publish a new version of the package and updaterequirements.txt
to use the remote version so that work can be reviewed by other developers / deployed on a review app instance.