Open PonteIneptique opened 3 years ago
Here is what to do to load a sqlite dump in a postgresql database:
DELETE FROM word_token WHERE corpus IN (SELECT DISTINCT wt.corpus FROM word_token as wt LEFT OUTER JOIN corpus as c ON c.id = wt.corpus WHERE c.id is NULL);
DELETE FROM token_history WHERE word_token_id IN (SELECT DISTINCT th.word_token_id FROM token_history as th LEFT OUTER JOIN word_token as wt ON wt.id = th.corpus WHERE wt.id is NULL);
DELETE FROM token_history WHERE corpus IN (SELECT DISTINCT th.corpus FROM token_history as th LEFT OUTER JOIN corpus as c ON c.id = th.corpus WHERE c.id is NULL);
DELETE FROM bookmark WHERE corpus_id IN (SELECT DISTINCT b.corpus_id FROM bookmark as b LEFT OUTER JOIN corpus as c ON c.id = b.corpus_id WHERE c.id is NULL);
DELETE FROM bookmark WHERE token_id IN (SELECT DISTINCT b.token_id FROM bookmark as b LEFT OUTER JOIN word_token as wt ON wt.id = b.token_id WHERE wt.id is NULL);
DELETE FROM favorite WHERE corpus_id IN (SELECT DISTINCT f.corpus_id FROM favorite as f LEFT OUTER JOIN corpus as c ON c.id = f.corpus_id WHERE c.id is NULL);
DELETE FROM corpus_custom_dictionary WHERE corpus IN (SELECT DISTINCT ccd.corpus FROM corpus_custom_dictionary as ccd LEFT OUTER JOIN corpus as c ON c.id = ccd.corpus WHERE c.id is NULL);
python manage.py db-create
pyrrha_pgload.txt
with information about source and target databases (the file is here:
pyrrha_pgload.txt )
d the file with this comment)pgloader pyrrha_pgload.txt
(I added .txt
in file name only because github asked it, it's only a file with a command in it)
@FrFerry I have only one error during the pgloader moment:
2022-03-28T13:09:13.673334+02:00 ERROR Database error 22001: valeur trop longue pour le type character varying(64)
CONTEXT: COPY corpus, ligne 401, colonne name : « Extrait des Lunettes de Princes (ca 1461-1465) de Jean Meschinot, d’après le manuscript BnF, fr. ... »
Result: the corpus table is empty.
Yet I can create fresh new corpora and manage them as will, so nothing apparently broke atm.
@FrFerry I have only one error during the pgloader moment:
2022-03-28T13:09:13.673334+02:00 ERROR Database error 22001: valeur trop longue pour le type character varying(64) CONTEXT: COPY corpus, ligne 401, colonne name : « Extrait des Lunettes de Princes (ca 1461-1465) de Jean Meschinot, d’après le manuscript BnF, fr. ... »
Result: the corpus table is empty.
Yet I can create fresh new corpora and manage them as will, so nothing apparently broke atm.
Well ... indeed we have more than 64 characters. These kind of restriction wasn't check by sqlite ?
In my opinion we have 2 solutions here:
What do you think ?
We should remove the limit, or set it to a higher count, such as 254
We should remove the limit, or set it to a higher count, such as 254
I pushed the modification on the pull request.
Even though it now seems to be working, I would not merge the PR before the tests are fixed (too many failures related to sql constraints) or investigated. Might be discussed/worked in an other issue though.
Hi, I'm taking over for @FrFerry for this ticket.
@FrFerry told me that the tests are mainly broken because of constraints that SQLite did not check but PostgreSQL does, but that these are not related to the behavior of the application. He proposed to not migrate the tests to PostgreSQL and instead to continue to use SQLite for the tests. What is your position on this?
I also opened a new PR with @FrFerry 's commits and my own changes to continue working on this ticket, could you close https://github.com/hipster-philology/pyrrha/pull/274 to avoid confusion?
I am currently trying to install the application using PostgreSQL - however, I cannot find the information on which version we have agreed on (if any) ? I'd go for PostgreSQL 14 unless you have some restrictions on your side ?
Please go for it. AFAIK, we should have no issues with PSQL 14 and I am not sure that there would be huge breaking change with 13 :)
I've done my tests with the v13.6 so far and its ok but please, proceed with v14 :)
OK, thanks for your reply!
Just to clarify this point, when I run the tests on https://github.com/hipster-philology/pyrrha/pull/280 they fail
when you say the tests need to be fixed, do you mean the tests running using SQLite? Do you want us to port the tests as well to PostgreSQL?
Hi @carinedengler, you should really check the current version of the dev branch, as it includes fixes for the current sqlite-only version of the application. Some tests were known for having a chance of failing (80% of the time), this seems to have fixed it. This also raises requirements version for security reasons.
I checked the dev branch out this morning and rebased the postgresql branch on it, the 3 failing tests (using SQLite) I have are related to the changes made for the PostgreSQL port. The question is therefore whether you would want me to fix the tests so they can be run on SQLite after the port to PostgreSQL or whether you would want me to port the SQLite tests to PostgreSQL ?
The change I am talking about seems to have been merged after your last commits ( https://github.com/hipster-philology/pyrrha/pull/281 ), are you sure you are up to date ? I pinged you on the PR with the current PR fixing those tests.
I must have fetched/rebased before you committed your latest changes, I should be up to date now
should I correct the tests for SQLite or for PostgreSQL ? or both ?
should I correct the tests for SQLite or for PostgreSQL ? or both ?
The application should run on both SQLite and PostgreSQL, as it is still used locally by some researchers.
CI Tests should also actually have a version for SQLite and one for PostgreSQL
OK, thanks for the clarification
when creating a new development instance with python manage.py run
what should be the default DBMS used ?
I think the default in dev mode should be SQLite, but we could have a new dev settings as Dev-Postgresql to facilitate work with this one
I think the default in dev mode should be SQLite, but we could have a new dev settings as Dev-Postgresql to facilitate work with this one
Could you give an example how you would use that? Since db.engine.dialect.name can be used to retrieve the DBMS currently in use, and DEV_DATABASE_URL allows for passing both SQLite and PostgreSQL accesses, it seems to me that introducing another setting is not necessary.
I think I would simply have a new config object in the config.py file that could be used from with the manage CLI
So I'd go with the information retrieval you talk about and the DATABASE URI parameter, yes :)
OK, I get your idea now :+1:
the tests should now be fixed for SQLite - I won't have time before the end of the week (maybe even start next week) to have a look at the PostgreSQL tests, but introducing a switch for the tests for SQLite or PostgreSQL and fixing the PostgreSQL tests will be the next items on the list
Is your feature request related to a problem? Please describe.
The current app shows limits with SQLite: it's slow, and does not benefit from a fast SQL engine when having different users doing multiple writes.
Describe the solution you'd like