iiab / iiab

Internet-in-a-Box - Build your own LIBRARY OF ALEXANDRIA with a Raspberry Pi !
https://internet-in-a-box.org
GNU General Public License v2.0
944 stars 74 forks source link

OER2Go/RACHEL catalog went missing around 2018-06-20 [UTF-8 glitches, "Content Preview" samples null] #853

Closed holta closed 3 years ago

holta commented 6 years ago

This is very serious.

@tim-moody will start a discussion on http://community.rachelfriends.org to try to work this out with Jonathan Field (primary developer of OER2GO/RACHEL) in coming days.

See https://github.com/iiab/iiab-admin-console/issues/79

tim-moody commented 6 years ago

back on 6/27/2018

holta commented 6 years ago

OER2GO/RACHEL catalog http://dev.worldpossible.org/cgi/json_api_v1.pl has reappeared about a week later, so am resolving.

holta commented 6 years ago

Oops I missed Tim's earlier comment at https://github.com/iiab/iiab-admin-console/issues/79#issuecomment-400709281 :

catalog is back, but not clear if links for downloads work. needs testing.

holta commented 6 years ago

@tim-moody,

@floydianslips discovered the new http://oer2go.org [provides broken links?] (permission error on their server, anyway that's what happens when he tries to work with RACHEL modules in Spanish).

Can you ask Jonathan F. to fix, once the issue is confirmed?

e.g. http://oer2go.org/mods/es-bibliofilo/

holta commented 6 years ago

@tim-moody I very much misunderstood what @floydianslips told me. Please check with him first.

For me, http://oer2go.org/viewmod/es-bibliofilo works. I am not in front of a computer so cannot try actual rsync download(s).

@floydianslips indicates IIAB's Admin Console [really??] is showing him stale "Sample" URL's like http://oer2go.org/mods/es-bibliofilo/

holta commented 6 years ago

Again I may not be understanding @floydianslips — earlier he said he got the broken link http://oer2go.org/mods/es-bibliofilo/ from Admin Console but now I'm understanding from him that he got these broken links from http://oer2go.org

In any case these broken links need investigation and he says "[IIAB Admin Console] seems to fail while installing content as well"

tim-moody commented 6 years ago

"[IIAB Admin Console] seems to fail while installing content as well" does not have enough info to be actionable.

holta commented 6 years ago

@floydianslips clarifies: "just the [http://oer2go.org] sample link[s are failing] as far as I can tell. I had to reinstall a module that failed to download all the content, but after using [IIAB Admin Console] to remove it and reinstall it all the content is there"

holta commented 6 years ago

"[IIAB Admin Console] seems to fail while installing content as well" does not have enough info to be actionable.

@tim-moody I now understand what @floydianslips meant to say:

All 56 "Sample" links are broken in Admin Console (http://box/admin) -> Install Content Tab -> Get OER2GO(RACHEL) Modules (on the right side of the page).

All 56 show the same broken link (http://box/admin/null).

Thanks Tim for investigating!

holta commented 6 years ago

FYI OER2GO / RACHEL catalog http://dev.worldpossible.org/cgi/json_api_v1.pl is currently down due to "500 Internal Server Error" :

Internal Server Error

The server encountered an internal error or misconfiguration and was unable to complete your request.

Please contact the server administrator at info@worldpossible.org to inform them of the time this error occurred, and the actions you performed just before this error.

More information about this error may be available in the server error log.
Apache/2.4.10 (Debian) Server at dev.worldpossible.org Port 80
tim-moody commented 6 years ago

some sample links are null in catalog will be fixed in future release

tim-moody commented 6 years ago

oer2go catalog has unicode errors. the latest know good is installed until these are fixed.

holta commented 6 years ago

Thanks much to @ivanistheone who is looking into this @ https://github.com/learningequality/kolibri-rachel-modules/issues/1

jamalex commented 6 years ago

Copying this in from my comment on the other issue:

It looks as if the database is fine:

mysql> SELECT title FROM modules WHERE moddir LIKE "%kolibri%";
+--------------------------------------------------------------+
| title                                                        |
+--------------------------------------------------------------+
| Kolibri                                                      |
| EngageNY (en) - Kolibri                                      |
| Delete Kolibri user data and upgrade version                 |
| Khan Academy (हिन्दी, हिंदी) - Kolibri                             |
| Khan Academy (es) - Kolibri                                  |
| Sikana (Español) - Kolibri                                   |
| Touchable Earth (en) - Kolibri                               |
| PhET Interactive Simulations (en) - Kolibri                  |
| PhET Interactive Simulations (es) - Kolibri                  |
| African Storybook - Kolibri                                  |
| EngageNY (es) - Kolibri                                      |
| Pratham Books' StoryWeaver - Kolibri                         |
| Touchable Earth (fr) - Kolibri                               |
| Khan Academy (Français, langue française) - Kolibri          |
| Sikana (Français) - Kolibri                                  |
| Sikana (English) - Kolibri                                   |
| MIT Blossoms - Kolibri                                       |
| Upgrade the Kolibri software                                 |
| Kolibri Index (English)                                      |
| TESSA - Teacher Resources - Kolibri                          |
| CK-12 - Kolibri                                              |
| Khan Academy (English) - Kolibri                             |
| Upgrade Kolibri software                                     |
+--------------------------------------------------------------+

So it seems the issue is a regression in World Possible's dev server / OER2Go PHP code, in terms of properly handling unicode. CC @needlestack and @j-schwartz.

tim-moody commented 6 years ago

"title" : "Khan Academy (Fran�ais, langue fran�aise) - Kolibri" compared to your select seems to confirm

the code is /var/www/dev/cgi/json_api_v1.pl

the comments at the end are intriguing

$r->content_type("application/json; charset=utf-8");
# surprisingly (because of my ignorance) if you specify UTF8 to JSON,
# it corrupts some (but not all!) of the UTF8 characters
my $json = JSON->new()->pretty(1)->canonical(1);
print $json->encode($modules);
holta commented 6 years ago

@tim-moody am moving this tkt to IIAB 6.7, so we can release IIAB 6.6 ASAP in coming days, and then address these critical issues in the weeks to come..

holta commented 6 years ago

Thanks @floydianslips who is helping to investigate the http://box/admin -> Install Content -> "Refresh OER2GO Catalog" button which currently responds:

GET-OER2GO-CAR FAILED and reported Unexpected error in Command GET-OER2GO-CAT

i.e. to confirm this is the same issue!?

Related: https://github.com/iiab/iiab-admin-console/issues/86

holta commented 6 years ago

CLARIF: To those who are still banging their head against the wall trying to get the "Refresh OER2GO Catalog" button to work, please rest assured that in the interim until this is properly fixed... most RACHEL/OER2GO modules are still downloadable... using a stale catalog from June/July 2018 that @tim-moody has now included as part of IIAB's default installation:

http://box/admin -> Install Content -> Get OER2GO(RACHEL) Modules

Thank you for your patience!

tim-moody commented 6 years ago

Please see iiab/iiab-admin-console#86. I manually edited the data using http://oer2go.org/cgi/checkallmods.pl.

So as of Oct 8, http://dev.worldpossible.org/cgi/json_api_v1.pl is available and no longer has bad characters.

However, the problem of null links to the sample pages remains.

"index_mod_sample_url" : null, and at the same time "logo_url" : null, are true of all entries

jamalex commented 6 years ago

Just a heads up that this isn't a sustainable solution, as Kolibri modules are generated and updated via an automated script, so it'll re-introduce those characters next time it's run.

jamalex commented 5 years ago

When I look at http://oer2go.org/cgi/checkallmods.pl and http://dev.worldpossible.org/cgi/json_api_v1.pl I see the proper Hindi and French characters -- @tim-moody did you edit them to proper unicode, or to remove those characters? If the former, I'm confused as to why it fixed it, since at the DB level it was already proper unicode. If the latter, I'm wondering how they returned.

I've held off on re-running the script to update the modules, but we can try that again to ensure we have a sustainable solution.

jamalex commented 5 years ago

@tim-moody I have done further testing and added comments/questions on your forum post: https://community.learningequality.org/t/kolibri-support-in-oer2go-catalog/830

tim-moody commented 5 years ago

@jamalex First, I misspoke elsewhere when I mentioned xml parsing; I meant json. To clarify my post above on editing, oer2go has a gui web interface to edit meta data. For the Hindi item it is

image

Under title you can see हिंदी, हिन्दी भाषा. I put that there, replacing the ???? which was previously there.

I know you have a database select above which shows proper unicode (note that it says हिन्दी, हिंदी, not what I put). I don't know what database that is and I have no access to the oer2go database.

Since I made that edit the Kolibri Hindi channel entry in http://dev.worldpossible.org/cgi/json_api_v1.pl is the below and not the ???? in the title as before. Similarly I edited a couple of items with c-cedilla like the Sikana item you mention in https://community.learningequality.org/t/kolibri-support-in-oer2go-catalog/830/3. So, the issue of character set is for now fixed, until someone else manually puts in non-unicode, or perhaps non-utf8, characters that do not parse in json.

{ "age_range" : null, "category" : null, "description" : "Khan Academy content for Hindi.", "file_count" : "2858", "index_mod_sample_url" : null, "is_hidden" : "No", "ksize" : "1505896", "lang" : "hi", "logo_url" : null, "moddir" : "hi-kolibri-channel-khan-academy", "module_id" : "163", "prereq_id" : null, "prereq_note" : "", "rating" : "0.0", "rsync_url" : "rsync://dev.worldpossible.org/rachelmods/hi-kolibri-channel-khan-academy", "source_url" : "", "title" : "Khan Academy ( हिंदी, हिन्दी भाषा) - Kolibri", "type" : "kolibri", "version" : "8-a55632", "zip_ftp_url" : "ftp://dev.worldpossible.org/zipped-modules/hi-kolibri-channel-khan-academy.zip", "zip_http_url" : "http://dev.worldpossible.org/zipped-modules/hi-kolibri-channel-khan-academy.zip" },

Here is a copy of the previous state of an item with c-cedilla

{ "age_range" : null, "category" : null, "description" : "Khan Academy content for French.", "file_count" : "15293", "index_mod_sample_url" : null, "is_hidden" : "No", "ksize" : "20305948", "lang" : "fr", "logo_url" : null, "moddir" : "fr-kolibri-channel-khan-academy", "module_id" : "173", "prereq_id" : null, "prereq_note" : null, "rating" : null, "rsync_url" : "rsync://dev.worldpossible.org/rachelmods/fr-kolibri-channel-khan-academy", "source_url" : "", "title" : "Khan Academy (Fran�ais, langue fran�aise) - Kolibri", "type" : "kolibri", "version" : "9-a55632", "zip_ftp_url" : "ftp://dev.worldpossible.org/zipped-modules/fr-kolibri-channel-khan-academy.zip", "zip_http_url" : "http://dev.worldpossible.org/zipped-modules/fr-kolibri-channel-khan-academy.zip" }

tim-moody commented 5 years ago

However, the problem with sample page and logo urls I also mentioned above remains, and not just for Kolibri, but for all entries.

tim-moody commented 5 years ago

I also reported this in http://community.rachelfriends.org/t/oer2go-catalog-fails/811

jamalex commented 5 years ago

Thanks for the additional helpful context!

I know you have a database select above which shows proper unicode (note that it says हिन्दी, हिंदी, not what I put). I don't know what database that is and I have no access to the oer2go database.

Background: I was given an account on the dev server for setting up and streamlining the creation of modules for Kolibri.

So, the issue of character set is for now fixed, until someone else manually puts in non-unicode, or perhaps non-utf8, characters that do not parse in json.

Not just manual editing -- I made automated scripts that run on the server to update and add Kolibri channels as RACHEL modules (directly against the DB). It's not currently running on a cronjob, but that was the plan (to keep things up to date): https://github.com/learningequality/kolibri-rachel-modules/blob/master/update_all_modules.sh

So I need to figure out why the unicode entered via the backend now gets corrupted in the process of being rendered by the OER2GO Perl scripts (it didn't used to, so something changed in the OER2GO code), whereas unicode entered via the web UI stays intact. I'm thinking it could be something to do with utf8mb4 vs utf8 in MySQL.

jamalex commented 5 years ago

Interesting... when I dump the table on the backend now, after your manual edits, it's corrupted on that side:

mysql> SELECT title FROM modules WHERE moddir LIKE "%kolibri%";
+----------------------------------------------------------------------------------------------------------------------------+
| title                                                                                                                      |
+----------------------------------------------------------------------------------------------------------------------------+
| Kolibri                                                                                                                    |
| EngageNY (en) - Kolibri                                                                                                    |
| Delete Kolibri user data and upgrade versioदी भाषा) - Kolibri                                                 ||
| Khan Academy (es) - Kolibri                                                                                                |
| Sikana (Español) - Kolibri                                                                                                |
| Touchable Earth (en) - Kolibri                                                                                             |
| PhET Interactive Simulations (en) - Kolibri                                                                                |
| PhET Interactive Simulations (es) - Kolibri                                                                                |
| African Storybook - Kolibri                                                                                                |
| EngageNY (es) - Kolibri                                                                                                    |
| Pratham Books' StoryWeaver - Kolibri                                                                                       |
| Touchable Earth (fr) - Kolibri                                                                                             |
| Khan Academy (Français, langue française) - Kolibri                                                                      |
| Sikana (Français) - Kolibri                                                                                               |
| Sikana (English) - Kolibri                                                                                                 |
| MIT Blossoms - Kolibri                                                                                                     |
| Upgrade the Kolibri software                                                                                               |
| Kolibri Index (English)                                                                                                    |
| TESSA - Teacher Resources - Kolibri                                                                                        |
| CK-12 - Kolibri                                                                                                            |
| Khan Academy (English) - Kolibri                                                                                           |
| Upgrade Kolibri software                                                                                                   |
+----------------------------------------------------------------------------------------------------------------------------+
23 rows in set (0.05 sec)
jamalex commented 5 years ago

I see a few options:

  1. @needlestack or Steve Bashford could take a look and work out the kinks in the Perl code to avoid muddling up the character encoding in between the database and the frontend (this would be the preferred solution)
  2. I remove all special characters from the titles used in module names, within my script (e.g. just have it be "Khan Academy (French) - Kolibri")
  3. I determine how to "encode" (incorrectly) the strings I'm inserting into the database so that it will be decoded into usable unicode in the frontend.
tim-moody commented 5 years ago

are we sure we are looking at the same database? unfortunately I have a login but not even readonly access to the database. you may already know that the perl api is /var/www/dev/cgi/json_api_v1.pl

the lines at the end worry me:

$r->content_type("application/json; charset=utf-8"); # surprisingly (because of my ignorance) if you specify UTF8 to JSON, # it corrupts some (but not all!) of the UTF8 characters

of your options, I wouldn't do 2 or 3

could add 4, rewrite in python. I have offered to help but need access to the db.

by the way I have seen Fran�ais before. I don;t think it is utf8. I have seen it on Macs.

The main problem with non-latin characters is we're never sure whether it's the source or the app rendering it that's the problem.

tim-moody commented 5 years ago

actually, where's the Hindi in your dumped database? Ah, has Kolibri, not kolibri.

Please dump that so we can see what happened. I copy/pasted that from Google translate.

jamalex commented 5 years ago

It's matching against the module name, which does contain "kolibri" (lowercase).

It seems the characters for Hindi, when printed to the console, trigger some backspace characters and hence it doesn't even display the row correctly:

mysql> SELECT moddir, title FROM modules WHERE moddir LIKE "hi-kolibri%";
+---------------------------------+----------------------------------------------------------------------------------------------------------------------------+
| moddir                          | title                                                                                                                      |
+---------------------------------+--------------------------------------------दी भाषा) - Kolibri                                                 |+
+---------------------------------+----------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
jamalex commented 5 years ago

by the way I have seen Fran�ais before. I don;t think it is utf8. I have seen it on Macs.

It's getting corrupted -- "�" is just a placeholder. It was correct UTF-8 in the database itself, but not by the time in reached the frontend and was rendered.

jamalex commented 5 years ago

could add 4, rewrite in python. I have offered to help but need access to the db.

If you already have server access, you should be able to access the DB -- I'll PM you some details.

jamalex commented 5 years ago

Sent details to you on Slack at https://offline-internet.slack.com/messages/DDBAZ78BV/

jamalex commented 5 years ago

I won't have time this week to mess with any code myself (prepping for a board meeting on Friday), but let me know if there's any useful context I can provide from our side. My script (linked above) directly inserts/updates modules via SQL.

tim-moody commented 5 years ago

stranger still, for title I get Khan Academy ( हिंदी, हिन्दी भाषा) - Kolibri in the mysql client, so for me the data looks right.

tim-moody commented 5 years ago

mysql> SELECT title FROM modules WHERE moddir LIKE "%olibri%"; +----------------------------------------------------------------------------+ | title | +----------------------------------------------------------------------------+ | Kolibri | | EngageNY (en) - Kolibri | | Delete Kolibri user data and upgrade version | | Khan Academy ( हिंदी, हिन्दी भाषा) - Kolibri | | Khan Academy (es) - Kolibri | | Sikana (Español) - Kolibri | | Touchable Earth (en) - Kolibri | | PhET Interactive Simulations (en) - Kolibri | | PhET Interactive Simulations (es) - Kolibri | | African Storybook - Kolibri | | EngageNY (es) - Kolibri | | Pratham Books' StoryWeaver - Kolibri | | Touchable Earth (fr) - Kolibri | | Khan Academy (Français, langue française) - Kolibri | | Sikana (Français) - Kolibri | | Sikana (English) - Kolibri | | MIT Blossoms - Kolibri | | Upgrade the Kolibri software | | Kolibri Index (English) | | TESSA - Teacher Resources - Kolibri | | CK-12 - Kolibri | | Khan Academy (English) - Kolibri | | Upgrade Kolibri software | +----------------------------------------------------------------------------+ 23 rows in set (0.00 sec)

looks good to me.

tim-moody commented 5 years ago

I am logged using putty on win10

tim-moody commented 5 years ago

sample and logo urls remain null in the current catalog. logo is in the api v2 catalog and could be merged

tim-moody commented 5 years ago

I have not seen Kolibri issues for some time, but I have not seen any new channels either. The nulls for logo and sample preview remain.

holta commented 3 years ago

It's been a long road but thankfully this/these issue(s) appear almost resolved after 3 years.

Thanks to progress like:

holta commented 3 years ago

Let's declare victory.

Thanks to @tim-moody for the very hard work here.

Over almost 1/3 of a decade!