sgsinclair / Voyant

GNU General Public License v3.0
207 stars 53 forks source link

Corpus ingestion fails when using /dtoc/ URL #431

Open ajmacdonald opened 6 years ago

ajmacdonald commented 6 years ago

Returns a corpus is empty error

sgsinclair commented 6 years ago

Not sure I understand, is this new? None of the DToC URLs work now? Right now I'm having trouble reaching Voyant but it seems like everything at McGill is out of reach at the moment.

ajmacdonald commented 6 years ago

I don't think it's new. This is when you try to upload a new corpus through the CorpusCreator panel, but using the https://voyant-tools.org/dtoc/ URL. I'm in the process of debugging it and it looks like storedDocumentSources become null at this point: https://github.com/sgsinclair/trombone/blob/95621b02f0adff9d0cb15170538ad2a011c955f5/src/main/java/org/voyanttools/trombone/tool/build/RealCorpusCreator.java#L81-L83

sgsinclair commented 6 years ago

what URL are you trying? is there a good public URL to use for testing?

ajmacdonald commented 6 years ago

I'm just trying to upload documents using this URL: https://voyant-tools.org/dtoc/

ajmacdonald commented 6 years ago

@sgsinclair the problem is this: https://github.com/sgsinclair/Voyant/blob/master/src/main/webapp/dtoc/index.jsp#L58 The inputFormat is defaulting to dtoc but we're now uploading general xml. The corpus is empty because the dtoc xpaths don't return anything from the (non-dtoc) document.

sgsinclair commented 6 years ago

Weird, because I'm sure I had tested with a regenerations file and I thought it wasn't working. But it does work, so I'm not sure this is a bug, though the error message could be friendlier and more informed.