Closed alexanderpanchenko closed 6 years ago
Although there are more sentences included in the new depcc index there are found less hits for the triples (if you only search in the text field: text:() ) what is the difference between the both indexes?
And how do I access the new index? Since when I use http://ltdemos.informatik.uni-hamburg.de/depcc-index/depcc/_search?q=text:watermelon%20AND%20apple%20AND%20sugar instead of http://ltdemos.informatik.uni-hamburg.de/depcc-index/commoncrawl2/_search?q=text:watermelon%20AND%20apple%20AND%20sugar there are found 56 hits whereas in Kibana there are 46 hits with depcc and 113 hits with commoncrawl
Hello,
Yes, this is normal. The new index is still in preparation (new documents are being added). The main difference of the old / new indices are:
Because the index is still in preparation, I suggest to make the index backend configurable so that both indices can be used (but if URI is available then links to the source documents are shown). It makes sense in my opinion currently to keep at /cam the current version, but deploy the system based on the ‘depcc’ new index to /cam2 or /cam3 to discuss concretely how to implement in the UI the links.
On Jul 23, 2018, at 8:09 AM, Matthias Schildwächter notifications@github.com wrote:
And how do I access the new index? Since when I use http://ltdemos.informatik.uni-hamburg.de/depcc-index/depcc/_search?q=text:watermelon%20AND%20apple%20AND%20sugar http://ltdemos.informatik.uni-hamburg.de/depcc-index/depcc/_search?q=text:watermelon%20AND%20apple%20AND%20sugar instead of http://ltdemos.informatik.uni-hamburg.de/depcc-index/commoncrawl2/_search?q=text:watermelon%20AND%20apple%20AND%20sugar http://ltdemos.informatik.uni-hamburg.de/depcc-index/commoncrawl2/_search?q=text:watermelon%20AND%20apple%20AND%20sugar there are found 56 hits whereas in Kibana there are 46 hits with depcc and 113 hits with commoncrawl
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/uhh-lt/cam/issues/82#issuecomment-406949772, or mute the thread https://github.com/notifications/unsubscribe-auth/ABY6voGW_kUcrb730SP20CMSdzBLumELks5uJWi3gaJpZM4VZue0.
Alright, thanks for the update of the depcc index. What is the reason for not filtering the duplicates on that index?
I introduced two versions of presenting the corresponding source:
The version is deployed here: http://ltdemos.informatik.uni-hamburg.de/cam3/#/ the sentences on the left follow the first and the sentences on the right the second approach. As far as we don't need to mark something on the sentences I would prefer the second approach.
On Jul 23, 2018, at 4:56 PM, Matthias Schildwächter notifications@github.com wrote:
Alright, thanks for the update of the depcc index. What is the reason for not filtering the duplicates on that index?
just because sources may be different. I introduced two versions of presenting the corresponding source:
A button is shown behind every sentence, which displays the source on hover and opens a new tab (loading that url) on click. ok. thanks - i will check it out. The sentence itself is clickable and displays the source on hover. When the sentence is clicked the source is loaded in a new tab. The version is deployed here: http://ltdemos.informatik.uni-hamburg.de/cam3/#/ http://ltdemos.informatik.uni-hamburg.de/cam3/#/ the sentences on the left follow the first and the sentences on the right the second approach. As far as we don't need to mark something on the sentences I would prefer the second approach.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/uhh-lt/cam/issues/82#issuecomment-407087357, or mute the thread https://github.com/notifications/unsubscribe-auth/ABY6vpH2WSRNuXo2_NsFebIEbYT0EIi-ks5uJeQpgaJpZM4VZue0.
i looked at both versions. both are actually quite nice - great job!
i think for now let’s keep the version on the right (with the underlines). it is also a bit less transparent wrt type of the index - if no URI is available underline is just not shown (but nothing is changed visually)
On Jul 23, 2018, at 5:07 PM, Alexander Panchenko panchenkoalexander@gmail.com wrote:
On Jul 23, 2018, at 4:56 PM, Matthias Schildwächter <notifications@github.com mailto:notifications@github.com> wrote:
Alright, thanks for the update of the depcc index. What is the reason for not filtering the duplicates on that index?
just because sources may be different. I introduced two versions of presenting the corresponding source:
A button is shown behind every sentence, which displays the source on hover and opens a new tab (loading that url) on click. ok. thanks - i will check it out. The sentence itself is clickable and displays the source on hover. When the sentence is clicked the source is loaded in a new tab. The version is deployed here: http://ltdemos.informatik.uni-hamburg.de/cam3/#/ http://ltdemos.informatik.uni-hamburg.de/cam3/#/ the sentences on the left follow the first and the sentences on the right the second approach. As far as we don't need to mark something on the sentences I would prefer the second approach.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/uhh-lt/cam/issues/82#issuecomment-407087357, or mute the thread https://github.com/notifications/unsubscribe-auth/ABY6vpH2WSRNuXo2_NsFebIEbYT0EIi-ks5uJeQpgaJpZM4VZue0.
Yes, good point with the transparency indeed. I will update the version to fully use the presentation on the right.
What did you mean by making the system configurable for both indexes? Like a slider which selects depcc/commoncrawl2 (like for fast search)?
On Jul 23, 2018, at 5:24 PM, Matthias Schildwächter notifications@github.com wrote:
Yes, good point with the transparency indeed. I will update the version to fully use the presentation on the right.
ok
What did you mean by making the system configurable for both indexes? Like a slider which selects depcc/commoncrawl2 (like for fast search)?
no, just on the startup in a conf file — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/uhh-lt/cam/issues/82#issuecomment-407097047, or mute the thread https://github.com/notifications/unsubscribe-auth/ABY6vrIn5RjTIurX6UmWDWt-y833kyxsks5uJeqZgaJpZM4VZue0.
Deployed it again.
Ok I will have a look how to use a config file in python.
just use a json to store most important configs (or YAML)
On Jul 23, 2018, at 6:04 PM, Matthias Schildwächter notifications@github.com wrote:
Deployed it again.
Ok I will have a look how to use a config file in python.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/uhh-lt/cam/issues/82#issuecomment-407111024, or mute the thread https://github.com/notifications/unsubscribe-auth/ABY6vmbduqIKbvb7MoKIDIy2QKwELNLJks5uJfPzgaJpZM4VZue0.
I would like to merge this branch into master and close this issue if you don't have any concerns left.
Okay
On Fri 27. Jul 2018 at 19:03, Matthias Schildwächter < notifications@github.com> wrote:
I would like to merge this branch into master and close this issue if you don't have any concerns left.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/uhh-lt/cam/issues/82#issuecomment-408479763, or mute the thread https://github.com/notifications/unsubscribe-auth/ABY6vkek90vQw9qRtcyS9DVZdUJbnxBpks5uK0fPgaJpZM4VZue0 .
The new index (depcc*) contains field 'document_id' which contains an URI of the source document.
A hyperlink to the source document should be added in the interface so that a user can go to the original document and see the sentence with a comparison in a context. This element of the UI is expected to ensure the trust of the users - sentences are not coming from some random source, but rather can be observed in a context.