Closed simongray closed 8 years ago
One way of creating topic groups might be to compare statements directly using semantic similarity and then creating topical paragraphs/lists based on these statements.
I need to be able to narrow down statements to only interesting statements. Currently, I'm trying to find some ways in which statements can said to be interesting or not:
Create a set of patterns to collect the most useful information for the profile! Stuff like occupation, hobbies, likes and dislikes, etc.
I will need to unpack that embedded statements within the statements that have I or we as the subject. Perhaps a good way of doing it is to use patterns to find specific statements first and then based on the pattern, treating the embedded statements differently. E.g. "I like/love/prefer [embedded statement] " could be a pattern.
(from Google Keep)
It might make a lot of sense to spend at least 1 day trying to to get #29 working to see if it improves results.
A generalised way to both rank statements and to get relational results when comparing profiles:
Statements are ranked as usual by information density. General multipliers are also applied based on a heuristic, e.g. statements that came from embedded statements in the "I think" pattern get a bonu, statements that fit some personal information pattern get a bonus or statements that express positive emotion get a bonus.
When comparing to a different profile, matching statements are sent to the top, while other statements get a bonus based on a heuristic, e.g. statements with a shared component (not counting boring ones) or statements from shared subreddits.
Results of commit 1a5be6e
2455 statements in first profile
5072 statements in second profile
shared: [exception, egg, year, branch, slower, understanding, unlikely, %, google, economy, community, exam, journalist, 1, 2, 3, selection, much, least, example, result, opposition, same, wikipedia, policy, hand, a, website, i, photo, weird, r, component, fast, norway, couple, republic, good, area, murder, minister, need, check, list, article, lack, useful, child, germany, representation, stretch, denmark, end, hard, mistake, citizenship, message, better, china, there, family, age, prime minister, movie, release, focus, variation, nothing, system, other, city, confident, guy, matter, ton, documentary, product, famous, question, cheap, picture, pastry, future, employee, subreddit, function, employer, all, comparison, law, unit, shame, chicken, university, today, any, minute, sport, reason, leg, english, poverty, courtesy, demand, difference, cut, worth, danes, :-rrb-, tariff, popular, situation, funny, religion, thing, principle, source, research, view, school, yourself, those, doubt, whatever, name, concept, book, impossible, show, house, leftovers, anything, shot, street, news, s-train, election, yes, engineer, way, what, geopolitics, time, happy, sweden, plan, case, philosophy, phone, light, style, exercise, tourism, lot, advisor, freedom, low, theme, means, more, great, wrong, porridge, choice, certain, stuff, statistics, small, door, luck, section, experience, day, combination, group, kind, history, illegal, both, most, market, important, effect, job, who, option, politics, part, point, bike, park, rest, process, amount, helsingør, clear, enough, meal, someone, third, mean, meat, attention, extent, bad, asylum, you, jump, knowledge, sure, easy, population, deal, hot, greenland, composition, term, ca, mine, mind, set, possible, right, food, austrians, series, language, faculty, bet, reply, criticism, which, image, she, take, month, blame, party, some, beginning, papers, related, skill, company, respects, home, form, developer, grant, he, big, practice, expert, bit, hope, norwegian, majority, model, text, french, likely, issue, large, sense, it, impression, government, world, man, ability, everything, belief, map, side, break, suggestion, machine, able, comparable, fun, bonus, while, that, high, solution, fine, host, different, level, author, feeling, relevant, true, datum, student, head, hour, consumer, fight, on, chance, nature, interesting, thread, due, pm, character, friend, stick, they, old, myself, something, quality, unique, foreigner, sub, access, industry, perspective, free, surprised, though, star, one, store, many, people, democracy, afraid, open, story, country, tv, available, uk, us, this, degree, look, idea, kid, rule, we, life, common, interest, themselves, definition, dane, socialist, figure, politician, technology, money, migration, step, comment, trend, type, loss, problem, review, price, work, itself, enjoyable, word, south, course, place, power, event, europe, danish, nice, opinion, culture, first, argument, perfect, hostel, minority, drug, justice, swedish, development, copenhagen, tax, person, safe, here, week, death, advice, scale, imo, car, intent, trade, response]
1218 interesting statements in first profile
r.qual.: {S+V+DO: "Different variation of meditation routines also help train social situational awareness", density: 0,91}
quality: {S+V+DO+IO: "Canada many refugees just resettle according to relatively tiny UN quotas", density: 0,91}
density: {S+V: "Capitalism is when is privately owned", density: 1}
r.qual.: {S+V+DO+IO: "Canada are really not taking in many refugees according to relatively tiny UN quotas", density: 0,86}
quality: {S+V+DO: "Different variation of meditation routines also help train social situational awareness", density: 0,91}
density: {S+V+E: "Most people drink avoid", density: 1}
r.qual.: {S+V+IO: "Typically these comparatively more generous welfare states are created by social democratic political parties", density: 0,86}
quality: {S+V: "Pretty sure high-end chip manufacturing has always been automated though", density: 0,9}
density: {S+V+DO: "Remote work is possible", density: 1}
I'm now working on moving anything subjective to the Profile class, so that the Statement class only reports back objective knowledge such as grammatical/lexical information or scores such as Lexical Density, which are based on objective criteria.
Quality/interestingness/relevance must all be seen in relation to the Profile and its needs. I am also moving over to using patterns to assess interestingness and wellformedness.
Finished a rudimentary version of this in time for he second study. Available in thesis hand-in commit: 47c0cf27f6cb46ba4a7891ef6dc0f4b9e91ab304
(needed for user testing)
This should probably be a simple data structure that contains statements and is able to search through them in a relatively fast way, perhaps using underlying hash maps, although to begin with it is probably sufficient to simply keep a set or a list.
(One way of organising with hash maps could be to have 4 maps organised by the primary word in each pure components, i.e. the key is the primary word and the value is the list of statements which have that primary word for that component type.)
It should be able to