usc-isi-i2 / dig-etl-engine

Download DIG to run on your laptop or server.
http://usc-isi-i2.github.io/dig/
MIT License
101 stars 39 forks source link

The `Action` view in myDIG is messed Up. See attached pic #222

Closed saggu closed 6 years ago

saggu commented 6 years ago

KG Stats for `cfr.org`

saggu commented 6 years ago

This is after I deleted all the documents with tld = cfr.org

GreatYYX commented 6 years ago

from es manually? how is _db.json, have u changed it manually before?

saggu commented 6 years ago

Yes, from ES manually. Havent touched _db.json today. Curious as to how does that query work? TO get the KG to show on mydig?

GreatYYX commented 6 years ago

complete query: https://github.com/usc-isi-i2/mydig-webservice/blob/master/ws/ws.py#L2245

GreatYYX commented 6 years ago

Basically, myDIG gets kg and kg original from es directly and full join them with the tlds which appear in catalog. kg original is restricted by term created_by: etk.

saggu commented 6 years ago

Also, what is happening here?

saggu commented 6 years ago

What is happening here ^ ?

GreatYYX commented 6 years ago

can u try above query in kibana

saggu commented 6 years ago

Result of

{
  "query": {
    "term": {
      "knowledge_graph.website.key": {
        "value": "cfr.org"
      }
    }
  }
}

is 1222

and result of

POST sage_kg/_search
 {
              "aggs": {
                  "group_by_tld_original": {
                    "filter": {
                      "bool": {
                        "must_not": {
                          "term": {
                            "created_by": "etk"
                          }
                        }
                      }
                    },
                    "aggs": {
                      "grouped": {
                        "terms": {
                          "field": "tld.raw",
                          "size": 2147483647
                        }
                      }
                    }
                  },
                  "group_by_tld": {
                    "terms": {
                      "field": "tld.raw",
                      "size": 2147483647
                    }
                  }
              },
              "size":0
            }

has that number to be 5872.

Looks wrong

GreatYYX commented 6 years ago

myDIG's query is based on TLD in CDR, ur term query is based on website in kg.

GreatYYX commented 6 years ago

btw, in latest mydig, delete by tld is also based on tld.raw, knowledge_graph.website is not sufficient in some cases (e.g., if no website is extracted).

GreatYYX commented 6 years ago

@saggu close it? what about #190

saggu commented 6 years ago

Yeah, I guess. Closing this