ushahidi / platform

Ushahidi Platform API version 3+
http://ushahidi.com
Other
684 stars 506 forks source link

CREES Research #2572

Closed caharding closed 6 years ago

caharding commented 6 years ago

CREES is automatic tagging so that posts are automatically tagged from a ML tool. Our grant requires that our partner OpenUniversity train data models for two use cases: Crisis & Elections.

We have an existing data model for crisis scenarios working. For our grant we need to hand over a new set of data for OpenUniversity to train an additional model.

OpenUniversity has a data set on Hurricane Harvey Ushahidi has a data set on Uchaguzi

We need to make sure the elections data is structured such that it can be used in OU ML tool.

Shadrock commented 6 years ago

This is a place-holder issue to hold 10% time for @willdoran or another dev to continue with some CREES research during the April and early May cycle as outlined in the roadmap. This will be assistance helping to gather or structure data from Uchaguzi so that the Open University can create a new data model specific to elections. The works should be fairly light with relatively little coding: but we wanted to be sure and preserve the time!

caharding commented 6 years ago

@Shadrock to add additional context in this scope and to open a bug for additional technical info from the partner

crcommons commented 6 years ago

Just noting that this issue will be on a short pause until Will returns from vacation.

Shadrock commented 6 years ago

For @willdoran (when he gets back!).

This feedback on potential CREES integration issues came to me via e-mail from @evhart. The issues were compiled as part of an evaluation that was done, I believe, as a simulation with students in the Netherlands as part of COMRADES. Several Ushahidi staff supported this simulation: Robbie, Will, and David were involved specifically.

  1. There were many examples where only the related/non-related categories are the only present categories (information types missing).
  2. Service name displayed rather than the category (e.g., 'Info Type' instead of one of the possible information types categories).
  3. Different categories were returned for the same post. This was mentioned by students in their report. This should not be possible. Could it be due to using CREES and CREES Uchaguzi together?

Another issue found when exporting the Uchaguzi data using SQL for Kenny (from COMRADES) was linked with the encoding of the LAT/LON data. We had some discussion with Will, Robbie and David Losada about it but we could not get the correct LON/LAT data. See the SQL example below:

SELECT ASTEXT(value) as location, x(value) as x, y(value) as y FROM post_point LIMIT 10 

location x y
POINT(4.764590900135562e19 -1.3068462330237261e-78) 4.764590900135562e19 -1.3068462330237261e-78
POINT(4.764590900135562e19 -1.3068462330237261e-78) 4.764590900135562e19 -1.3068462330237261e-78
POINT(-1.189199587984945e18 7.612047485286466e218) -1.189199587984945e18 7.612047485286466e218
POINT(-1.189199587984945e18 7.612047485286466e218) -1.189199587984945e18 7.612047485286466e218
POINT(-1.189199587984945e18 7.612047485286466e218) -1.189199587984945e18 7.612047485286466e218
POINT(-2.1201109403518694e167 -1.2673912903807526e166) -2.1201109403518694e167 -1.2673912903807526e166
POINT(4.764590900135562e19 -1.3068462330237261e-78) 4.764590900135562e19 -1.3068462330237261e-78
POINT(-11665962360.439825 -2.5803972529263045e-165) -11665962360.439825 -2.5803972529263045e-165
POINT(-3.27237135808076e-194 -1.18416047305313e-190) -3.27237135808076e-194 -1.18416047305313e-190
POINT(-1.189199587984945e18 7.612047485286466e218) -1.189199587984945e18 7.612047485286466e218
willdoran commented 6 years ago

@Shadrock Would it be possible to get links to Post examples for the above 3 issues? I can use the text of those posts to see what's happening specifically.

Shadrock commented 6 years ago

@evhart can you comment on the above, please? Do you have links to the post examples?

Shadrock commented 6 years ago

@willdoran @jrtricafort not sure if compiling Uchaguzi data should be a separate issue or go here. It will be used to train a new CREES model... so it's related to this ticket. Please advise if I need to create a new issue or provide further details.

evhart commented 6 years ago

@willdoran @Shadrock From the COMRADES test instance, I have the following examples:

  1. Information types missing:
  2. Service name displayed:
    • I could not find an example so maybe it got fixed? (my comments date from December so I do not remember exactly where I saw it).
  3. Different categories for the same post:
    • It was reported by Kenny so I cannot see it mysel. My guess is that it is because the standard version of CREES and the Uchaguzi version were used at the same time and both version can return the same label but are fitted for different types of events. Could it be due to a race condition between the two services?
Shadrock commented 6 years ago

@willdoran can you comment on whether we've addressed the bugs in this thread? Once that's done, we can close this issue. I do not foresee any further work on CREES for the COMRADES project.

willdoran commented 6 years ago

@Shadrock We haven't addressed them yet, but I'll move them up and look at them next week during the retreat. I believe I have fixed them but I need to confirm this on the Comrades deployments. I'll update the issue when I test.

willdoran commented 6 years ago

@Shadrock I believe these issue are fixed as I couldn't reproduce them on the newest setup of Comrades(the one that contains HDX).

Shadrock commented 6 years ago

Excellent, thanks @willdoran. I'm good to close this out and then re-open if they come up again.