uhh-lt / wsd

A system for unsupervised knowledge-free interpretable word sense disambiguation based on distributional semantics
http://jobimtext.org/wsd
GNU General Public License v3.0
19 stars 5 forks source link

any way to reduce size of model by removing Image database #11

Open PC09 opened 6 years ago

PC09 commented 6 years ago

I need to just use the API endpoints and not the GUI where images are also loaded. For Eg- Endpoint: /predictSense Example request curl -H "Content-Type: application/json" \ -X POST \ -d '{"context":"Java is an island.","word":"Java", "model": "simwords"}' \ $YOUR_API_SERVER/predictWordSense

I saw that the imgdata/data stores data . Is there a way to remove this 135GB data and still have the API working ?

alexanderpanchenko commented 6 years ago

Thanks for your interest! Unfortunately as of now there is no out of the box way to get rid of images, but removing should be straightforward. I suggest deploy a local system and start removing the "unnecessary parts".

On Wed, Sep 5, 2018, 1:55 PM PC09 notifications@github.com wrote:

I need to just use the API endpoints and not the GUI where images are also loaded. For Eg- Endpoint: /predictSense Example request curl -H "Content-Type: application/json" -X POST -d '{"context":"Java is an island.","word":"Java", "model": "simwords"}' $YOUR_API_SERVER/predictWordSense

I saw that the imgdata/data stores data . Is there a way to remove this 135GB data and still have the API working ?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/uhh-lt/wsd/issues/11, or mute the thread https://github.com/notifications/unsubscribe-auth/ABY6vkA7vDUMEBcnzxESeKXUzdkTQyNbks5uX7u2gaJpZM4WawCZ .

PC09 commented 6 years ago

Okay. I executed a few commands to check the contents of imgdata database.

Command- sudo docker-compose exec db psql -U postgres -c '\l+'

Output- List of databases Name | Owner | Encoding | Collate | Ctype | Access privil eges | Size | Tablespace | Description
-------------+----------+----------+-------------+-------------+---------------- -------+---------+------------+-------------------------------------------- postgres | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 |
| 6976 kB | pg_default | default administrative connection database template0 | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 | =c/postgres
+| 6857 kB | pg_default | unmodifiable empty database | | | | | postgres=CTc/po stgres | | | template1 | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 | =c/postgres
+| 6857 kB | pg_default | default template for new databases | | | | | postgres=CTc/po stgres | | | wsp_default | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 |
| 134 GB | pg_default | (4 rows)

When I use database wsp_default , to see the tables in it - sudo docker-compose exec db psql -U postgres -c '\dt+ .' In the output, I can only see tables which have size in Kb. So how does it sum up to 135 GB . Is my output incorrect or I am missing any tables?

Schema | Name | Type | Owner | Size | Description --------------------+-------------------------+-------+----------+------------+-

information_schema | sql_features | table | postgres | 96 kB | information_schema | sql_implementation_info | table | postgres | 48 kB | information_schema | sql_languages | table | postgres | 48 kB | information_schema | sql_packages | table | postgres | 48 kB | information_schema | sql_parts | table | postgres | 48 kB | information_schema | sql_sizing | table | postgres | 48 kB | information_schema | sql_sizing_profiles | table | postgres | 8192 bytes | pg_catalog | pg_aggregate | table | postgres | 48 kB | pg_catalog | pg_am | table | postgres | 40 kB | pg_catalog | pg_amop | table | postgres | 80 kB | pg_catalog | pg_amproc | table | postgres | 64 kB | pg_catalog | pg_attrdef | table | postgres | 8192 bytes | pg_catalog | pg_attribute | table | postgres | 392 kB | pg_catalog | pg_auth_members | table | postgres | 0 bytes | pg_catalog | pg_authid | table | postgres | 40 kB | pg_catalog | pg_cast | table | postgres | 48 kB | pg_catalog | pg_class | table | postgres | 136 kB | pg_catalog | pg_collation | table | postgres | 40 kB | pg_catalog | pg_constraint | table | postgres | 48 kB | pg_catalog | pg_conversion | table | postgres | 56 kB | pg_catalog | pg_database | table | postgres | 8192 bytes | pg_catalog | pg_db_role_setting | table | postgres | 8192 bytes | pg_catalog | pg_default_acl | table | postgres | 0 bytes | pg_catalog | pg_depend | table | postgres | 464 kB | pg_catalog | pg_description | table | postgres | 312 kB | pg_catalog | pg_enum | table | postgres | 0 bytes | pg_catalog | pg_event_trigger | table | postgres | 0 bytes | pg_catalog | pg_extension | table | postgres | 40 kB | pg_catalog | pg_foreign_data_wrapper | table | postgres | 0 bytes | pg_catalog | pg_foreign_server | table | postgres | 0 bytes | pg_catalog | pg_foreign_table | table | postgres | 0 bytes | pg_catalog | pg_index | table | postgres | 56 kB | pg_catalog | pg_inherits | table | postgres | 0 bytes | pg_catalog | pg_language | table | postgres | 40 kB | pg_catalog | pg_largeobject | table | postgres | 0 bytes | pg_catalog | pg_largeobject_metadata | table | postgres | 0 bytes | pg_catalog | pg_namespace | table | postgres | 40 kB | pg_catalog | pg_opclass | table | postgres | 56 kB | pg_catalog | pg_operator | table | postgres | 152 kB | pg_catalog | pg_opfamily | table | postgres | 48 kB | pg_catalog | pg_pltemplate | table | postgres | 40 kB | pg_catalog | pg_policy | table | postgres | 0 bytes | pg_catalog | pg_proc | table | postgres | 608 kB | pg_catalog | pg_range | table | postgres | 40 kB | pg_catalog | pg_replication_origin | table | postgres | 0 bytes | pg_catalog | pg_rewrite | table | postgres | 544 kB | pg_catalog | pg_seclabel | table | postgres | 8192 bytes | pg_catalog | pg_shdepend | table | postgres | 40 kB | pg_catalog | pg_shdescription | table | postgres | 48 kB | pg_catalog | pg_shseclabel | table | postgres | 8192 bytes | pg_catalog | pg_statistic | table | postgres | 224 kB | pg_catalog | pg_tablespace | table | postgres | 40 kB | pg_catalog | pg_transform | table | postgres | 0 bytes | pg_catalog | pg_trigger | table | postgres | 8192 bytes | pg_catalog | pg_ts_config | table | postgres | 40 kB | pg_catalog | pg_ts_config_map | table | postgres | 48 kB | pg_catalog | pg_ts_dict | table | postgres | 40 kB | pg_catalog | pg_ts_parser | table | postgres | 40 kB | pg_catalog | pg_ts_template | table | postgres | 40 kB | pg_catalog | pg_type | table | postgres | 96 kB | pg_catalog | pg_user_mapping | table | postgres | 0 bytes | (61 rows)

If possible, is there a way to get tables which store images?

alexanderpanchenko commented 6 years ago

As far as I know, no information about images is stored in the database.

@fmarten , please correct me if I am wrong.

PC09 commented 6 years ago

Oh, so in that case, imgdata/ which stores 18GB data that is the only information where images is stored? . My aim is to remove the image and its related data to cut down on the size of the complete package. Can you please let me know where all image related information is stored, so that I can try removing them and check whether the API still works by giving text output? TIA

alexanderpanchenko commented 6 years ago

I think you can delete the images - the api anyways sends to the user only paths to them not binaries.

On 6 Sep 2018, at 15:52, PC09 notifications@github.com wrote:

Oh, so in that case, imgdata/ which stores 18GB data that is the only information where images is stored? . My aim is to remove the image and its related data to cut down on the size of the complete package. Can you please let me know where all image related information is stored, so that I can try removing them and check whether the API still works by giving text output? TIA

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/uhh-lt/wsd/issues/11#issuecomment-419100625, or mute the thread https://github.com/notifications/unsubscribe-auth/ABY6vnF6oLVESfJ0U3xCVqUZAy5N-9xxks5uYSiUgaJpZM4WawCZ.

PC09 commented 6 years ago

Thanks for your quick reply!. If the database has not stored any image data, then how the size of wsp_default database is coming up as 134GB. What all is stored in the db?

wsp_default | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 | | 134 GB | pg_default |

TIA