Closed vmarkovtsev closed 6 years ago
After updating core-retrieval to master, I get
INFO[0004] start processing repos workers=32
WARN[0004] empty repository repo=0162bb0b-5d2a-5a9c-62cf-5a81779e5db9
WARN[0004] empty repository repo=0162bb0b-5d28-7a05-3aca-46f5d0c88c1f
WARN[0004] empty repository repo=0162bb0b-5d2d-89ae-6355-d442830057ee
WARN[0004] empty repository repo=0162bb0b-5d2c-cd4a-dab7-2c92e8fa4043
WARN[0004] empty repository repo=0162bb0b-5d2e-9274-f90c-7063fb2ee658
INFO[0004] finished processing all repositories failed=0 processed=5 total=5
Borges schema changed, that’s why it’s failing On Thu, 12 Apr 2018 at 23:18, Vadim Markovtsev notifications@github.com<mailto:notifications@github.com> wrote:
After updating core-retrieval to master, I get
INFO[0004] start processing repos workers=32 WARN[0004] empty repository repo=0162bb0b-5d2a-5a9c-62cf-5a81779e5db9 WARN[0004] empty repository repo=0162bb0b-5d28-7a05-3aca-46f5d0c88c1f WARN[0004] empty repository repo=0162bb0b-5d2d-89ae-6355-d442830057ee WARN[0004] empty repository repo=0162bb0b-5d2c-cd4a-dab7-2c92e8fa4043 WARN[0004] empty repository repo=0162bb0b-5d2e-9274-f90c-7063fb2ee658 INFO[0004] finished processing all repositories failed=0 processed=5 total=5
— You are receiving this because you were assigned. Reply to this email directly, view it on GitHubhttps://github.com/src-d/datasets/issues/48#issuecomment-380947667, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ABQFF6yqOOFXkzaRvzJFEQm851CZpb9Fks5tn8SxgaJpZM4TSf6H.
I need to find the proper commit where everything works. Is it possible in theory @erizocosmico or borges unsynced with indexer too much?
p.dbRepo.References
foreign key does not work for some reason. The schema seems to be in order...
I looked through the code, everything looks fine but the foreign key is empty for some reason. I am really curious what the problem will be.
borges versions that will work with the old schema are 0.11.x ones. You can get the borges binary from here: https://github.com/src-d/borges/releases/tag/v0.11.4
The old schema had the references in jsonb format on a column in repositories
table, we didn't have foreign keys.
@ajnavarro I updated the core-retrieval
package in borges-indexer locally and ran it, it uses exactly the same version as the modern borges now. It compiled and almost worked as seen in the logs... Would it be hard to update borges-indexer or at least point me where to investigate? The schema is the same on both ends - this means there should be an easy thing to fix.
There shouldn't really be anything to do in borges-indexer besides updating core-retrieval to the latest version.
I assure you that this is what I did...
I can post a DB dump here if you want.
Don't worry, I'll take a look whenever I take this issue. For the time being, use borges 0.11.x as it's the version that we used when this was written.
This means writing siva files again, but it looks like the only way now.
@ajnavarro @erizocosmico bump
I don't know if I'm wrong, but this is not a priority for us (@smola , @mcuadros ?). You can still use the borges version that we used to fetch PGA, and then use the borges indexer.
We are going to present these tools to the community on May 30th and they are currently broken.
The issue is aligned to https://github.com/src-d/okrs/issues/14
Not at all in my opinion. The problem here is an outdated temporal tool created for a specific project is not working with the latest borges version. It's not working because we are updating and improving borges to reach that okr.
@vmarkovtsev Is there any problem with presenting the process for PGA generation as using a specific borges and borges-indexer version? You can even link to the exact GitHub release pages with binaries. At least for boreges. We could also publish here a working binary of borges-indexer if needed.
I don't see a problem in presenting and documenting borges-indexer as what it is: a quick tool done for generation of the first version of the dataset and that is likely to not be present in the process for future versions of the dataset.
@smola Recent borges versions include important bugfixes which allow to clone more repositories.
Most of the people there have Windows and we do not provide binary releases for it. This means that they have to clone a repo to the specific directory under src
, fetch the specific revision which is known to work, build it and run it. Updating borges-indexer would allow to at least exclude the step with checking out the specific revision and stick with go get
one-liner.
Update borges-indexer
dependencies to make it work with the new schema on borges versions >= 0.12.x
.
This changes will make borges-indexer fails with prior versions. On other words, with this new version will be impossible to make again the index file from the actual PostgreSQL-PGA database, that is using borges 0.11.x schema.
No other changes will be done on borges-indexer
, like add new columns, just make it compatible with the new schema.
Caveats @smola @mcuadros ?
@vmarkovtsev will you need to use the up-to-date borges-indexer with out current (old) PostgreSQL PGA database?
https://github.com/src-d/datasets/issues/48#issuecomment-388327027
@smola There is no such need.
I run
borges consumer
and it writes several siva files and records to the DB successfully.Then I run
borges-indexer
and get