haystack / eyebrowse-server

Eyebrowser Server
http://eyebrowse.csail.mit.edu/
MIT License
27 stars 7 forks source link

Initialize tags app and data model #162

Closed jesszwang closed 7 years ago

jesszwang commented 7 years ago

Adding the Django tags app for highlighting and tagging inline in articles.

amyxzhang commented 7 years ago

any way we can integrate this more with the existing Eyebrowse data model? For instance it might be useful for Article to be linked with EyeHistory and PopularHistory. This would make it easy to, for instance, mark them in the Eyebrowse feed, which is currently a feed of EyeHistory and PopularHistory items. Maybe what we need is a generic Page table and Domain table, with EyeHistory and PopularHistory referencing these and Article as a subclass of Page.

Also I wonder how much we can integrate with the existing Tag model, which has some existing features in the tool. For instance, conceptually I see values and topics as specific instances of tags. And one could tag a domain, a page, or a highlight. It would be great if the model could reflect this flexibility.

jesszwang commented 7 years ago

I think it makes sense to have a Page object - let me look into refactoring for that.

jesszwang commented 7 years ago

For the Tag model, does it make sense for it to look like the following?

class Tag:
    user
    name
    description
    color 

    Page (foreignkey) # null if domain is populated
    Domain (foreignkey) # null if page is populated

class Topic(Tag):
    position

class ValueTag(Tag):
    vote_count 
    highlight (foreignkey)
amyxzhang commented 7 years ago

I'm guessing position refers to stance on a topic? I noticed in your screenshots that you'll be looking at pro and anti for the moral values as well...are you still planning on doing that or leaving that off? Having that stance for moral values could be interesting but could also be more noisy if we're slicing and dicing those dictionaries...

I think voting and tagging a highlight could be something that is generalizable to other tags as well. Also, I could see the use case for tagging an entire page or domain as a particular Value.

Take a look at how we implemented blacklists and whitelists for an example of how to do it in Django when you get to it: https://github.com/haystack/eyebrowse-server/blob/master/api/models.py

I think it also makes sense to add a private field to the tag given that the tags we have so far implemented are private ones.

jesszwang commented 7 years ago

Ah, the mockup copy is a bit misleading there. For moral values, the pro/con-ness is indicated by the value itself (e.g. fairness vs cheating, care vs harm). So moral values don't have such a valence.

I like the idea of generalizable features, but it seems like it'd require a lot of refactoring of all the existing eyebrowse code before the app could function properly again if we make a number of changes to the existing data model now. I'm in favor of augmenting the classes though, and then if time permits later, helping modify the controller functions. How is this for more generalizable classes?

class Tag(models.Model):
    user
    name 
    description
    color   
    is_private
    domain (current field, is a url string)

    Page (foreignkey) # null if domain is populated
    Domain (foreignkey) # null if page is populated

    Highlight (foreignkey)
    vote_count

class Topic(Tag):
    position
class Domain(models.Model):
    name 
    url 

class Page(models.Model):
    url 
    Domain (foreignkey)

class Article(Page):
    title

class Highlight(models.Model):
    date 
    highlight
    Article (foreignkey)
amyxzhang commented 7 years ago

That looks good, although I think you could still have a Values class that subclasses Tag without any extra fields.

I meant that in the moral values literature, you can treat the moral values as the 5 overall categories (care/harm, etc.) or you can dive deeper to 10, with each value having a pro- and anti- version (care, harm, etc.). I don't think it really matters how you model it but I was just expressing concern that moving to the pro and anti version might reduce the signal of the classifier. We can talk offline about this.

jesszwang commented 7 years ago

Hmm I looked into refactoring EyeHistory and PopularHistory code if we delete the fields present in the new Page object, and it seems to be a fairly involved task that might be worth separating into a separate pull request?

amyxzhang commented 7 years ago

That's fine if it's a separate pull request. But I'm against submitting any code now that leaves the system in a broken state, if it's possible for you to avoid that.

amyxzhang commented 7 years ago

Can you clean up the migration scripts? It looks like there are several, with some deleting a field and then adding it back? Probably all your changes can go into 1 migration.

amyxzhang commented 7 years ago

@jesszwang let me know if you'd like me to look at this again.

jesszwang commented 7 years ago

@amyxzhang yes please!

amyxzhang commented 7 years ago

I'm still seeing a lot of migration scripts that seem to be deleting and then adding the same field back in (Tag.domain and Tag.is_private for instance), which seems weird. Why are we doing that? Also any reason why all the changes can't just be 1 migration script?

jesszwang commented 7 years ago

Ah my bad, forgot about the api migrations. Those should all be cleaned up now, look again @amyxzhang?

amyxzhang commented 7 years ago

Hey @jesszwang, I've merged this in. I agree with your prior comment that we'll need another pull request to actually delete fields and move the unnecessary parts of EyeHistory, etc. to the Page and Domain tables and also just populate these fields for every existing row that connects to them. Then we can get rid of the null=True line for several of these foreign keys since that shouldn't actually happen. One way to do this if you're nervous about using South is just writing a python script to move all those values over (see scripts/ folder), and change the code to refer to the new fields. Then once we know that's working, we can more confidently delete the old fields using South.