hydrusnetwork / hydrus

A personal booru-style media tagger that can import files and tags from your hard drive and popular websites. Content can be shared with other users via user-run servers.
http://hydrusnetwork.github.io/hydrus/
Other
2.37k stars 156 forks source link

Tag Insights report generator #462

Open bbappserver opened 4 years ago

bbappserver commented 4 years ago

It would be nice to be able to get various statistics about tags for the hashes I have on file (not the PTR in general). Should have some common export formats like CSV for further analysis. I don't need these statistics calculated all the time, but an on demand report generation button would be ideal.

Wizard Screens

Tag Services
(*) All Combined local, remote and virtual
( )Selection:
  [*] Local
  [*] Local virtual
  [ ] Remote
Hash Services
[*] Local hashes
[ ] All hashes
From
(*)All namespaces
( )Single namespace [____] (blank for global)
( ) Single namespace and global [____] 
#Or do view for a list of namespaces if you are a masochist
Calculate
(*) tag-hash incidence count
( ) tag-tag coincidence count
  [*] ignore direct coincidence from parent relationships

Example Algorithm

Gather

tagDomain=COMBINED
hash_domain=LOCAL
namespace="creator"

r=set()
for ns in (namespace,''):
  # [(hash_id,tag_id,namespace_id)]
  rs = hashes_join_real_tags(tag_domain=tag_domain,hash_domain=hash_domain,namespace=ns) 
  # [(hash_id,tag_id,namespace_id)]
  rsv= hashes_join_virtual_tags(tag_domain=tag_domain,hash_domain=hash_domain,namespace=ns)

  # [(hash_id,tag_id,namespace_id)]
  # Your tuples-like must be __hash__, and __eq__ aware to make set work properly.
  # Combine the lists removing any duplicaiton
  r= r.union(rs.union(rsv))

Count

 d={}
 for t in r:
   k= t.tag #this is a ubtag and anamespace wrapped in a keyabeble object
   if r.tag_id in d:
     d[k]+=1
   else:
    d[k]=1
 csvwriter.write('namespace','subtag','count')
 for k in d:
   csvwriter.write(k.namespace_string,k.subtag_string,d[k])

Coincidence

Just use hydrus's regular logic to convert (hash,tag_id)

 for h in hashes:
  for t1 in tags[h]:
    for t2 in tags[h][1:]:
      if t1 in d:
        if t2 in d[t1];
          d[t1][t2]+=1
        else:
          d[t1][t2]=1
       else:
        d[t1]={}

You get the idea for csv writing

csvwriter(t1,t2,d[t1][t2] if t2 in d[t1] else 0)       
rachmadaniHaryono commented 4 years ago

related

https://hydrus.tumblr.com/post/187016946869/heres-the-stats-from-the-previous-post-i-think

above is statistic from ptr but i want some of those and what op described on hydrus itself

https://64.media.tumblr.com/7a64fee2269d74f135d85f69424c231e/tumblr_pw98wp0xnD1qznht1o1_500.png