Closed jywarren closed 10 years ago
I think you got this. Just to confirm, we'll need to rerun this script every so often?
On Tue, May 27, 2014 at 6:59 AM, Jeffrey Warren notifications@github.comwrote:
Ah, i think i solved a big problem. It was deleting too many tags, as fixed in the commit i just made; we should now see far fewer orphaned deletions and the missing "warren" tag should show up. Re-importing db now.
— Reply to this email directly or view it on GitHubhttps://github.com/jywarren/plots2/issues/181#issuecomment-44279654 .
Oh hey, one thing to look out for: tags with bar (|). I notice a log of spam tags have that. It might be worth checking your resulting tag database for any tags with a bar to see how well the spam tags were removed.
On Tue, May 27, 2014 at 11:31 AM, Bryan btbonval@gmail.com wrote:
I think you got this. Just to confirm, we'll need to rerun this script every so often?
On Tue, May 27, 2014 at 6:59 AM, Jeffrey Warren notifications@github.comwrote:
Ah, i think i solved a big problem. It was deleting too many tags, as fixed in the commit i just made; we should now see far fewer orphaned deletions and the missing "warren" tag should show up. Re-importing db now.
— Reply to this email directly or view it on GitHubhttps://github.com/jywarren/plots2/issues/181#issuecomment-44279654 .
ah, good call. I could add a line to remove all of them, and start disallowing them? What do you think?
The migration script will only need to be run once. All new tag creation
using Node.add_tag() will ensure no duplicate DrupalTags from here out. I
haven't added a validation to make duplicate DrupalTags fail, but we could
add that too in case someone does not use add_tag(), but it wouldn't
prevent someone from doing it in SQL. It'd be nice to have a "best
practice" of using add_tag, but if you don't think it's too draconian, i
can add the uniqueness validator, which is something like validates :uniqueness => :name
On Tue, May 27, 2014 at 2:40 PM, Bryan Bonvallet notifications@github.comwrote:
Oh hey, one thing to look out for: tags with bar (|). I notice a log of spam tags have that. It might be worth checking your resulting tag database for any tags with a bar to see how well the spam tags were removed.
On Tue, May 27, 2014 at 11:31 AM, Bryan btbonval@gmail.com wrote:
I think you got this. Just to confirm, we'll need to rerun this script every so often?
On Tue, May 27, 2014 at 6:59 AM, Jeffrey Warren < notifications@github.com>wrote:
Ah, i think i solved a big problem. It was deleting too many tags, as fixed in the commit i just made; we should now see far fewer orphaned deletions and the missing "warren" tag should show up. Re-importing db now.
— Reply to this email directly or view it on GitHub< https://github.com/jywarren/plots2/issues/181#issuecomment-44279654> .
— Reply to this email directly or view it on GitHubhttps://github.com/jywarren/plots2/issues/181#issuecomment-44317448 .
In theory, I think those spam tags should be orphaned and disappear. I saw your orphan check ensures that the nodes are active and not banned/hidden.
If you find bars, we might want to leave them in place as a handy way to find spam posts and ban us some spam users.
As for uniqueness, I don't think it is Draconian. It's the opposite. Pushing database logic to the database is absolutely correct. A relational database is intended to guarantee coherence of the data, so duplication should be found and rejected by the database. Its the default notion of Rails that databases are all MySQL and don't do anything useful with their lives, but I digress. The database should absolutely take part in maintaining coherence. I believe :uniqueness
pushes that constraint to the database.
The problem is that the uniqueness constraint, even done up in the database, won't cross tables. It really would be better to have just the one table. The M2M thing creates a hidden, secret table, but it should suit the need just fine. I still don't understand what you meant by leaving options open to have the two tables.
Well, the Rails idiom, as you point out, is to have all the constraints in the model, so I'll go ahead and add that ( http://guides.rubyonrails.org/active_record_validations.html#uniqueness).
re: two tables, are you a proponent of just having a single tags table with
a name field, and when we want all entries for a tag, just calling
Tag.where(:name => "infragram")
? I prefer fewer tables, and like that
format. I had originally made a tags table to try to completely flatten out
those multiple tables, but more recently have just been trying to do one
thing at a time. If you like that, i'm open to doing that in a future
migration, but we should perhaps make a new issue since this one is getting
pretty nuts long.
On Tue, May 27, 2014 at 3:27 PM, Bryan Bonvallet notifications@github.comwrote:
The problem is that the uniqueness constraint, even done up in the database, won't cross tables. It really would be better to have just the one table. The M2M thing creates a hidden, secret table, but it should suit the need just fine. I still don't understand what you meant by leaving options open to have the two tables.
— Reply to this email directly or view it on GitHubhttps://github.com/jywarren/plots2/issues/181#issuecomment-44323066 .
Yeah a new issue would be good, especially in the publiclab repo.
I was just climbing on my soapbox and complaining about multiple tables. I helped out with some of the code initially with likes and so forth, which is similar to tags. I should start to pitch in again beyond advice from afar.
On Tue, May 27, 2014 at 12:36 PM, Jeffrey Warren notifications@github.comwrote:
Well, the Rails idiom, as you point out, is to have all the constraints in the model, so I'll go ahead and add that ( http://guides.rubyonrails.org/active_record_validations.html#uniqueness).
re: two tables, are you a proponent of just having a single tags table with a name field, and when we want all entries for a tag, just calling
Tag.where(:name => "infragram")
? I prefer fewer tables, and like that format. I had originally made a tags table to try to completely flatten out those multiple tables, but more recently have just been trying to do one thing at a time. If you like that, i'm open to doing that in a future migration, but we should perhaps make a new issue since this one is getting pretty nuts long.On Tue, May 27, 2014 at 3:27 PM, Bryan Bonvallet notifications@github.comwrote:
The problem is that the uniqueness constraint, even done up in the database, won't cross tables. It really would be better to have just the one table. The M2M thing creates a hidden, secret table, but it should suit the need just fine. I still don't understand what you meant by leaving options open to have the two tables.
— Reply to this email directly or view it on GitHub< https://github.com/jywarren/plots2/issues/181#issuecomment-44323066> .
— Reply to this email directly or view it on GitHubhttps://github.com/jywarren/plots2/issues/181#issuecomment-44324162 .
gosh yes, i feel like we could get this all done in like <15 tables, easily, and Drupal was using ~200. Did you see https://github.com/publiclab/plots2/pull/73 to remove the extra DrupalUrlAlias table?
On Tue, May 27, 2014 at 3:40 PM, Bryan Bonvallet notifications@github.comwrote:
Yeah a new issue would be good, especially in the publiclab repo.
I was just climbing on my soapbox and complaining about multiple tables. I helped out with some of the code initially with likes and so forth, which is similar to tags. I should start to pitch in again beyond advice from afar.
On Tue, May 27, 2014 at 12:36 PM, Jeffrey Warren notifications@github.comwrote:
Well, the Rails idiom, as you point out, is to have all the constraints in the model, so I'll go ahead and add that ( http://guides.rubyonrails.org/active_record_validations.html#uniqueness).
re: two tables, are you a proponent of just having a single tags table with a name field, and when we want all entries for a tag, just calling
Tag.where(:name => "infragram")
? I prefer fewer tables, and like that format. I had originally made a tags table to try to completely flatten out those multiple tables, but more recently have just been trying to do one thing at a time. If you like that, i'm open to doing that in a future migration, but we should perhaps make a new issue since this one is getting pretty nuts long.On Tue, May 27, 2014 at 3:27 PM, Bryan Bonvallet notifications@github.comwrote:
The problem is that the uniqueness constraint, even done up in the database, won't cross tables. It really would be better to have just the one table. The M2M thing creates a hidden, secret table, but it should suit the need just fine. I still don't understand what you meant by leaving options open to have the two tables.
— Reply to this email directly or view it on GitHub< https://github.com/jywarren/plots2/issues/181#issuecomment-44323066> .
— Reply to this email directly or view it on GitHub< https://github.com/jywarren/plots2/issues/181#issuecomment-44324162> .
— Reply to this email directly or view it on GitHubhttps://github.com/jywarren/plots2/issues/181#issuecomment-44324623 .
Let me take this comment box in an otherwise overly commented issue to state that I did see that PR and that I love the work that @sirMackk has been doing. One of these days we need to figure out how to give him some proper credit through our PL circles.
Agreed, long live @sirMackk! OK, pending this last import and re-check, we should be able to run this on the live database! Then on to @sirMackk's PR, woohoo!
On Tue, May 27, 2014 at 3:49 PM, Bryan Bonvallet notifications@github.comwrote:
Let me take this comment box in an otherwise overly commented issue to state that I did see that PR and that I love the work that @sirMackkhttps://github.com/sirMackkhas been doing. One of these days we need to figure out how to give him some proper credit through our PL circles.
— Reply to this email directly or view it on GitHubhttps://github.com/jywarren/plots2/issues/181#issuecomment-44325914 .
@btbonval @jywarren Thank you very much - being able to help out even a little bit is all the credit I need, honestly. The world sorely needs more initiatives like PublicLab and the great people who make it run.
Hmm, got a "duplicate entry" error;
Mysql2::Error: Duplicate entry '1-1-2363' for key 'PRIMARY': UPDATE `community_tags` SET `tid` = 1 WHERE `community_tags`.`tid` = 3199 AND `community_tags`.`nid` = 2363
Hopefully this is the last thing...
Hmm, made a fix to duplications but now spaces and decapitalizations arent working.
What commit? I can take a look if you think it'd help.
On Thu, May 29, 2014 at 9:19 AM, Jeffrey Warren notifications@github.comwrote:
Hmm, made a fix to duplications but now spaces and decapitalizations arent working.
— Reply to this email directly or view it on GitHubhttps://github.com/jywarren/plots2/issues/181#issuecomment-44550978 .
that last one. Grr. Thanks... i have to design a poster now but will check tonight. MapKnitter no space left issues also pressing :-(
Ah, i bet it failed because on the spaces/downcasing, now that we have a validation for uniqueness, a 2nd copy of the "Proven in the field" tag failed uniqueness when it collided with "proven-in-the-field" -- so it just stayed with spaces, and was then not deleted as a duplicate!
OK, on that premise i'm just going to disable the uniqueness validator until after the consolidation is run. We'll uncomment it then. This may be related to my current state of exhaustion, but it's a valid way to fix this without unduly complicating the migration code.
New reimport and migration test running... typically takes about 12 hrs as I place nice with the disk ... ionice
command FTW
Wow, it just completed and I think we are doing well. I do see that although the # of tags tallies, and it doesn't look like we've lost any communitytags, this tagged map seems to be missing from test.publiclab.org:
http://test.publiclab.org/tag/czech-republic vs http://publiclab.org/tag/czech-republic
hmm, actually that map has all tags in both versions. It just doesn't show up on the map sidebar for that tag. Odd.
OK, can I get some thumbs up to go forward with this? Thanks!
Thumb up only if you run a database backup of production before deploying, in which case we would have nothing to lose! :)
On Fri, May 30, 2014 at 8:18 AM, Jeffrey Warren notifications@github.com wrote:
OK, can I get some thumbs up to go forward with this? Thanks!
— Reply to this email directly or view it on GitHub https://github.com/jywarren/plots2/issues/181#issuecomment-44662671.
err wait if this takes all night to run, will that cause problems with race conditions? e.g. people adding or changing tags while the script is processing them?
On Fri, May 30, 2014 at 8:19 AM, Bryan btbonval@gmail.com wrote:
Thumb up only if you run a database backup of production before deploying, in which case we would have nothing to lose! :)
On Fri, May 30, 2014 at 8:18 AM, Jeffrey Warren notifications@github.com wrote:
OK, can I get some thumbs up to go forward with this? Thanks!
— Reply to this email directly or view it on GitHub https://github.com/jywarren/plots2/issues/181#issuecomment-44662671.
no, it only takes forever if a) we have to reimport the db and b) if we care about overusing disk io, which we wouldn't if we were on the production server anyways.
My worry is that we do it, then long after realize we lost some tags. But you're right, we could always go get them from the db dumps. OK!
Yeah that all sounds perfect then. backup, execute, recover if needed. Boom goes the dynamite.
On Fri, May 30, 2014 at 8:24 AM, Jeffrey Warren notifications@github.com wrote:
no, it only takes forever if a) we have to reimport the db and b) if we care about overusing disk io, which we wouldn't if we were on the production server anyways.
My worry is that we do it, then long after realize we lost some tags. But you're right, we could always go get them from the db dumps. OK!
— Reply to this email directly or view it on GitHub https://github.com/jywarren/plots2/issues/181#issuecomment-44663387.
We are good! Migrated. I'm going to close this and open a new one for the code cleanup!!!
remove spaced tags too!